SAS & EXCEL
PART 1: IMPORTING WORKBOOKS
AND WORKSHEETS
Peter Ott
FAIB, BC Ministry of FLNRO, Victoria, BC
2015-05-25
1
• Convert worksheet to a delimited text file (e.g. csv), then use infile & input statements
• ODBC or OLE DB - with proper licences
• DDE - old school
• Proc Import
• Excel libname engine
OPTIONS AVAILABLE:
2015-05-25
2
• Introduced in SAS version 9
• Works for writing to Excel too
• Requires SAS/ACCESS Interface to PC Files
• Excel (MS Office) and SAS need to both have the same 32 or 64 bit architecture
• Libname xlsx has arrived in 9.4. release 2 (including additional features)
EXCEL LIBNAME ENGINE
2015-05-25
3
LIBNAME libref <engine name> <physical file-name>
<libname options> ;
<SAS Code> <dataset options> ;
LIBNAME libref CLEAR ;
9.2 SYNTAX
2015-05-25
4
libname lssthn8 'D:\SUAVe spring 2015\OpenProbData for SUAVe.xlsx';
proc print data=lssthn8.'suave_2$'n(obs=25);
run;
*the whole shebang;
data whatever;
set lssthn8.'suave_2$'n;
run;
*named range;
data whatever_subset;
set lssthn8.'suave_2$a1:b101'n;
run;
libname lssthn8 clear;
EXAMPLE #1
2015-05-25
5
Header = Yes / No• If the worksheet does not have a header row with variable names, header=no
has SAS apply the default variable names F1, F2, F3
Dbsaslabel = Compat / None• causes names from Excel column headers to be assigned as SAS variable labels.
SAS variable names are automatically converted from the column headers into SAS name rule compliant names.
Scan_time = Yes / No• specifies whether to scan all row values for a DATETIME data type field to
determine the TIME data type based on the setting.
Stringdates = Yes / No• specifies whether datetime values are read from the data source as character
strings or as numeric date values
Datetime = Yes / No• specifies whether to assign the DATE. or DATETIME. format for datetime
columns in the data source
COMMON LIBNAME OPTIONS
2015-05-25
6
Mixed = Yes / No
• No: scans first 8 records (default) and makes best guess whether to designate a column as numeric or character
• Yes: convert such mixed columns to character, thus preserving character entries occurring deep in the data
• Problem: 8 records does not cut the mustard!!!!
• Solution: change a key on the computer’s registry
COMMON LIBNAME OPTIONS (CONT.)
2015-05-25
7
To change the registry key for the 32-bit version of SAS that is running on a 64-bit operating system (FLNRO Users):
• In Windows environments, select Start ► Run and type REGEDIT to display the Registry Editor.
• In the registry tree, select HKEY_LOCAL_MACHINE ► Software ►Wow6432Node ►Microsoft ► Office ► 14.0 ► Access Connectivity Engine ► Engines.
• Double-click the Excel node.
• In the right panel, double-click the TypeGuessRows entry.
• Change the value data from 8 to 0.
• Click OK.
• Select File ► Exit to exit the Registry Editor window.
MIXED OPTION (CONT.) 2015-05-25
8
libname lssthn8 'D:\SUAVe spring 2015\OpenProbData for
SUAVe.xlsx‘ mixed=no; *default is no;
*mixed data type;
data hazel;
set lssthn8.'suave_2 numeric$'n;
run;
data hazel2;
set lssthn8.'suave_2 numeric$'n(dbsastype=(z=char25));
run;
libname lssthn8 clear;
EXAMPLE #2A2015-05-25
9
libname lssthn8 'D:\SUAVe spring 2015\OpenProbData for
SUAVe.xlsx' mixed=yes;
*mixed data type;
data hazel;
set lssthn8.'suave_2 numeric$'n;
run;
data hazel2;
set lssthn8.'suave_2 numeric$'n(dbsastype=(z=char25));
run;
libname lssthn8 clear;
EXAMPLE #2B2015-05-25
10
Dbsastype = (COLUMN-NAME='SAS-DATA-TYPE')
• forces an Excel column to be read as a specified type
• Sas-data-types are: char(n), numeric , date, time and datetime
• More???
DATASET OPTIONS2015-05-25
11
libname WrkBk 'D:\SUAVe spring 2015\lib_test.xlsx';
data WrkBk.oddball; *sheet will be named oddball;
set hazel;
run;
libname WrkBk clear;
EXAMPLE #3 (EXPORT)
2015-05-25
12
libname in 'D:\SUAVe spring 2015\reneSummary.xlsx';
Proc contents data = in._all_ out = a noprint;
run;
Data _null_; *reading-in Doug's indicator variables;
set a (keep = memname) end = eof;
name = compress(memname, '$');
name = translate(trim(name) , '_', ' ');
str =compbl( "Data " || name || "; set in.'" || memname||"'n"||" (dbsastype=(ysm_pop=numeric mat_pop=numeric));" );
Call Execute (str);
if eof then do;
str = 'run;';
call execute (str);
end;
run;
libname in clear;
*Solution courtesy Oleg L of communities.sas.com ;
EXAMPLE #4: READING-IN MULTIPLE WORKSHEETS
2015-05-25
13
67 libname in 'D:\SUAVe spring 2015\reneSummary.xlsx';
NOTE: Libref IN was successfully assigned as follows:
Engine: EXCEL
Physical Name: D:\SUAVe spring 2015\reneSummary.xlsx
68
69 Proc contents data = in._all_ out = a noprint;
70 run;
NOTE: The data set WORK.A has 21 observations and 40 variables.
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.45 seconds
cpu time 0.06 seconds
71
72 Data _null_; *reading-in Doug's indicator variables;
73 set a (keep = memname) end = eof;
74 name = compress(memname, '$');
75 name = translate(trim(name) , '_', ' ');
76 str =compbl( "Data " || name || "; set in.'" || memname ||"'n"||"
76 ! (dbsastype=(ysm_pop=numeric mat_pop=numeric));" );
77 Call Execute (str);
78 if eof then do;
79 str = 'run;';
80 call execute (str);
81 end;
82 run;
NOTE: There were 21 observations read from the data set WORK.A.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.00 seconds
NOTE: CALL EXECUTE generated line.
2015-05-25
14
NOTE: CALL EXECUTE generated line.
1 + Data tsa23 ; set in.'tsa23$ 'n (dbsastype=(ysm_pop=numeric mat_pop=numeric));
NOTE: There were 52 observations read from the data set IN.'tsa23$'n.
NOTE: The data set WORK.TSA23 has 52 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.17 seconds
cpu time 0.01 seconds
2 + Data tsa23 ; set in.'tsa23$ 'n (dbsastype=(ysm_pop=numeric mat_pop=numeric));
NOTE: There were 52 observations read from the data set IN.'tsa23$'n.
NOTE: The data set WORK.TSA23 has 52 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.28 seconds
cpu time 0.04 seconds
21 + Data tsa29_vri ; set in.'tsa29_vri$ 'n (dbsastype=(ysm_pop=numeric mat_pop=numeric));
22 + run;
NOTE: There were 92 observations read from the data set IN.'tsa29_vri$'n.
NOTE: The data set WORK.TSA29_VRI has 92 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.27 seconds
cpu time 0.03 seconds
83
84 libname in clear;
NOTE: Libref IN has been deassigned.
2015-05-25
15
%macro impt(filename, i);
libname wcpool "&filename" mixed=yes header=no;
data XL&i;
set wcpool.'Sheet1$'n;
run;
%mend impt;
%let path=D:\SUAVe spring 2015\;
data _null_;
command = "dir /b D:\SUAVES~1\wc*.xlsx"; *note the 8 char format in the path;
infile dummy pipe filevar=command end=eof truncover;
do i = 1 by 1 while(not eof);
input wb_name $128.;
wb_name=catt("&path.", wb_name);
put 'NOTE: ' wb_name=;
call execute(cats('%nrstr(%impt(', wb_name,',',i,'));'));
end;
stop;
run;
libname wcpool clear;
data all;
set XL:;
run;
*Solution courtesy data_null and Art C on communities.sas.com;
EXAMPLE #5: READING IN MULTIPLE WORKBOOKS2015-05-25
16
86 %macro impt(filename, i);
87 libname wcpool "&filename" mixed=yes header=no;
88
89 data XL&i;
90 set wcpool.'Sheet1$'n;
91 run;
92
93 %mend impt;
94
95 %let path=D:\SUAVe spring 2015\;
96
97 data _null_;
98 command = "dir /b D:\SUAVES~1\wc*.xlsx"; *note the 8 char format in the path;
99 infile dummy pipe filevar=command end=eof truncover;
100 do i = 1 by 1 while(not eof);
101 input wb_name $128.;
102 wb_name=catt("&path.", wb_name);
103 put 'NOTE: ' wb_name=;
104 call execute(cats('%nrstr(%impt(', wb_name,',',i,'));'));
105 end;
106 stop;
107 run;
NOTE: The infile DUMMY is:
Unnamed Pipe Access Device,
PROCESS=dir /b D:\SUAVES~1\wc*.xlsx,RECFM=V,
LRECL=256
NOTE: wb_name=D:\SUAVe spring 2015\wcdraw2 Peter O c1.xlsx
NOTE: wb_name=D:\SUAVe spring 2015\wcdraw2 Peter O c2.xlsx
NOTE: wb_name=D:\SUAVe spring 2015\wcdraw2 Peter O c3.xlsx
NOTE: 3 records were read from the infile DUMMY.
The minimum record length was 23.
The maximum record length was 23.
NOTE: DATA statement used (Total process time):
real time 0.10 seconds
cpu time 0.03 seconds
2015-05-25
17
NOTE: CALL EXECUTE generated line.
1 + %impt(D:\SUAVe spring 2015\wcdraw2 Peter O c1.xlsx,1);
NOTE: Data source is connected in READ ONLY mode.
NOTE: Libref WCPOOL was successfully assigned as follows:
Engine: EXCEL
Physical Name: D:\SUAVe spring 2015\wcdraw2 Peter O c1.xlsx
NOTE: There were 146 observations read from the data set WCPOOL.'Sheet1$'n.
NOTE: The data set WORK.XL1 has 146 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.25 seconds
cpu time 0.01 seconds
2 + %impt(D:\SUAVe spring 2015\wcdraw2 Peter O c2.xlsx,2);
NOTE: Data source is connected in READ ONLY mode.
NOTE: Libref WCPOOL was successfully assigned as follows:
Engine: EXCEL
Physical Name: D:\SUAVe spring 2015\wcdraw2 Peter O c2.xlsx
NOTE: There were 146 observations read from the data set WCPOOL.'Sheet1$'n.
NOTE: The data set WORK.XL2 has 146 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.26 seconds
cpu time 0.06 seconds
3 + %impt(D:\SUAVe spring 2015\wcdraw2 Peter O c3.xlsx,3);
NOTE: Data source is connected in READ ONLY mode.
NOTE: Libref WCPOOL was successfully assigned as follows:
Engine: EXCEL
Physical Name: D:\SUAVe spring 2015\wcdraw2 Peter O c3.xlsx
NOTE: There were 146 observations read from the data set WCPOOL.'Sheet1$'n.
NOTE: The data set WORK.XL3 has 146 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.27 seconds
cpu time 0.03 seconds
108
109 libname wcpool clear;
NOTE: Libref WCPOOL has been deassigned.
110
111 data all;
112 set XL:;
113 run;
NOTE: There were 146 observations read from the data set WORK.XL1.
NOTE: There were 146 observations read from the data set WORK.XL2.
NOTE: There were 146 observations read from the data set WORK.XL3.
NOTE: The data set WORK.ALL has 438 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.40 seconds
cpu time 0.06 seconds
2015-05-25
18
SAS libname Excel engine – try it out!!!!!
CONCLUSION
2015-05-25
19