Date post: | 14-Apr-2018 |
Category: |
Documents |
Upload: | shiva-kumar |
View: | 240 times |
Download: | 0 times |
of 63
7/27/2019 Enterprise Miner
1/63
SAS Enterpr ise Miner
Release 4.3
A brief overview: analysis of theDonor Recapture Case (Case 3)
Kevin Garsek Class of 2006
7/27/2019 Enterprise Miner
2/63
Importing Base Data
SASs main drawback is the fact that if any
line of data has a null or blank value it willtotally disregard the full record
In this case, if we were unable tomanipulate the data, the available recordswould decrease dramatically
We can fight back by recoding the data aswill be shown in the import step
7/27/2019 Enterprise Miner
3/63
Base SAS Interface Screen
7/27/2019 Enterprise Miner
4/63
Importing Charity Data
Text Editor
7/27/2019 Enterprise Miner
5/63
Text Editor
We will use the text editor in Base SAS to import the Charity Case data. In orderto use this editor, you simply type as you would in any text editor.
7/27/2019 Enterprise Miner
6/63
Text Editor
A line by line example of the code that we will use is as follows:
libname charity 'C:\Documents and Settings\Kevin\Desktop\Datamining\charity.1';denotes the master folder where the raw data is housed your local PC
data charity.raw;tells SAS to create a new dataset named charity raw
infile 'chr\2.dat' missover firstobs=2;lets SAS know the individual subfolder in which the data is housed and tells it to import it into the new dataset
input OSOURCE $;names the data column OSOURCE and the $ tells SAS that this is character based data (if this was left out, SASassumes that the data is numerical in format)
OSOURCE_D = 0;due to prevalent missing data, this creates a new dummy variable termed OSOURCE_D and makes the value 0for every record
if trim(OSOURCE) = "
the trim statement deletes any erroneous spaces and the if sets up the opening of an if then statement tocompensate for blank data
then do; OSOURCE = "0";this sets all missing values in the OSOURCE column to 0
OSOURCE_D = 1;this sets the newly created dummy variable to 1 when OSOURCE was blank in the input file
end;this ends this statement as all code from infile to end can be written on a single line in the text editor
7/27/2019 Enterprise Miner
7/63
Importing Charity Data
The below depicts the completed code. The actual code can be easily writtenIn Excel using a & statement and then pasted into the text editor. Moving the
writing process to Excel will save considerable time during this laborious process.
7/27/2019 Enterprise Miner
8/63
Importing Charity Data
Once the code is completed, you will need to right hand click in the text editorand select submit all. This will tell SAS to read through the code in the text
editor and execute. Be prepared, due to the large size of the data, this willtake considerable time to complete.
7/27/2019 Enterprise Miner
9/63
Starting Enterprise Miner from Base SAS module
You should now have a fully working dataset and you are now ready to openEnterprise Miner by following the subsequent slides.
7/27/2019 Enterprise Miner
10/63
Starting Enterprise Miner from Base SAS module
7/27/2019 Enterprise Miner
11/63
Starting Enterprise Miner from Base SAS module
7/27/2019 Enterprise Miner
12/63
Binding Data to Program
This is an exasperating activity
Even for someone who took a SAStraining course in Enterprise Miner
The documentation is pathetic
Ill document each step carefully in case
this ever happens to you
7/27/2019 Enterprise Miner
13/63
Name Project Charity and DragInput Data Node to Workspace
7/27/2019 Enterprise Miner
14/63
Bind Data to Project
Right click on tools to get this menu.
7/27/2019 Enterprise Miner
15/63
Bind Data to Project
Left click on initialization, left click top edit.
7/27/2019 Enterprise Miner
16/63
Bind Data to Project
Right click select; browse for library RDATA; click ok
7/27/2019 Enterprise Miner
17/63
Bind Data to Project
Gotcha: Must select RAW and hit enter even though only data set in RDATA
7/27/2019 Enterprise Miner
18/63
Change to Larger Sample
Left click change; changed to 10,000 to give low response items representation
7/27/2019 Enterprise Miner
19/63
Success!
7/27/2019 Enterprise Miner
20/63
Click Variables Tab
Notice that some variables rejected including some, this is typically due to thefact that that column has only one value throughout e.g. a dummy variable that
is 0 due to no variation in the input data.
7/27/2019 Enterprise Miner
21/63
Then Bad Things Happen
Who knows why.
If I hadnt taken the course the slides
would stop here.
Thats the only reason I know what to do
Ill document this also, in case it happens
to you.
7/27/2019 Enterprise Miner
22/63
Crash Recovery
Right click on top level icon; select explore
7/27/2019 Enterprise Miner
23/63
Crash Recovery
Open emproj; delete all files with extension .lck; open user subfolder; delete
everything in user subfolder
7/27/2019 Enterprise Miner
24/63
Analysis Resumes
Well have a look at MAILCODE.
Enterprise Miner has some neat graphicaltools that are easy to use.
The simplest and easiest are part of thedata input tool.
7/27/2019 Enterprise Miner
25/63
A Histogram
Right click item, select view distribution of MAILCODE from drop down menu
7/27/2019 Enterprise Miner
26/63
Histogram of Mailcode
SAS has classified as missing data that R accepted and used!
7/27/2019 Enterprise Miner
27/63
Must Identify TARGET_D as Target
Right click row item in column Model Role, select Change Model Role from
drop down menu, select target from next drop down menu
7/27/2019 Enterprise Miner
28/63
Histogram of Target
This is what makes the problem hard: extremely low response rate!
7/27/2019 Enterprise Miner
29/63
Save changes!
7/27/2019 Enterprise Miner
30/63
Add Data Partition Node
Drag down from tool bar above and connect line by dragging the mouse.
7/27/2019 Enterprise Miner
31/63
This is What it Does
We will choose to use an 80%/20% training/validation allocation.
Close box, right click, click Run on drop down menu.
7/27/2019 Enterprise Miner
32/63
Design Philosophy
Click lower tools tab. Note tools on left. One drags a tool to worksheet and
connects with arrows. Well now drag and connect regression.
7/27/2019 Enterprise Miner
33/63
Regression
Chose stepwise selection, validation error. That mimics what we did in R.
7/27/2019 Enterprise Miner
34/63
Regression
Right hand click on the Regression node and select run
7/27/2019 Enterprise Miner
35/63
Regression
Regression is highlighted in green while running
7/27/2019 Enterprise Miner
36/63
Regression
Lets take a look at the results; SAS has a very different interpretation of importantvariables that the R analysis
7/27/2019 Enterprise Miner
37/63
Regression
The error rate is not that bad, but the significant variables are not necessarily easily
interpretable.
7/27/2019 Enterprise Miner
38/63
Regression
Lets try it again with a few changes to the model selection
7/27/2019 Enterprise Miner
39/63
Regression
Again, we get results, but nothing easily interpretable.
7/27/2019 Enterprise Miner
40/63
Regression
Lets limit the regression to those variables determined by R to be significant.To do this, we will again right hand click on regression and select open.
7/27/2019 Enterprise Miner
41/63
Regression
Then go to the variables tab. Right hand click under the status column for eachunneeded variable and set the status to dont use.
7/27/2019 Enterprise Miner
42/63
Regression
In addition to limiting our variables to those from the R results we are going to addan interaction as well as a squared variable. The first step is to add the squared term
by adding a transform variables node and right hand clicking on the node andselecting open.
7/27/2019 Enterprise Miner
43/63
Regression
From the variables tab, we will right hand click on DOB and select Transform.
7/27/2019 Enterprise Miner
44/63
Regression
We will now select square. This will create a new variable, DOB_L1S6, which will
then be used in our next regression.
7/27/2019 Enterprise Miner
45/63
Regression
Our next step is to create an interaction. To do this, go back to the main diagram anddouble click on regression. This should bring you into the model manager where youwill click on the Interaction Builder icon.
7/27/2019 Enterprise Miner
46/63
Regression
On this screen, you should use the Ctrl button to highlight both Lastgift and Pepstrfl.Next, press the Cross button in order to create the new interaction variable. The newvariable should be added to the available terms window and should be used in
subsequent regressions.
7/27/2019 Enterprise Miner
47/63
Regression
Results! While the initial bar graph may look complex, this is how SAS handles
character data and creating dummy variables.
7/27/2019 Enterprise Miner
48/63
Regression
As we now look at the table, or coefficient estimates, we have interpretable
results!
7/27/2019 Enterprise Miner
49/63
Regression
For those that are interested, you can look at the Code tab and see the actualSAS coding that one would have to write if you were to program this regression
manually.
7/27/2019 Enterprise Miner
50/63
Regression
Lets add another level of analysis and try to rid the data of outliers. To do this, youwill need to incorporate a Filter Outlier node between the Transform Variables and
Regression nodes.
7/27/2019 Enterprise Miner
51/63
Regression
Double click on the Filter Outliers node and then go to the Settings tab. I have usedthe above settings, but feel free to experiment for the best outcome. Once you
have completed this step, run the regression.
7/27/2019 Enterprise Miner
52/63
Moving On, Try a Tree
T
7/27/2019 Enterprise Miner
53/63
The tree itself is on the next slide.
Does this look familiar?
This is exactly the same as Fig 22,Learning and Validation MSEof Topic 2, Bias Variance Tradeoff.
Tree
T
7/27/2019 Enterprise Miner
54/63
SAS does have some great graphics! Below is the tree which is
typically presentable to a general audience.
Tree
7/27/2019 Enterprise Miner
55/63
Moving On, Try a Neural Net
Net
7/27/2019 Enterprise Miner
56/63
Net
We will use the defaults for this round of processing. Duringthe run we see the below graphic.
Net
7/27/2019 Enterprise Miner
57/63
Net
The results. Decent output but very difficult to disseminate toa general audience.
7/27/2019 Enterprise Miner
58/63
Assessment Tool
The assessment tool is supposed to givelift charts.
Apparently it only does so for binary
response.
The menu item is blank for predictivemodels.
The tool is good for easily comparingvarying model error rates.
7/27/2019 Enterprise Miner
59/63
Assessment Tool
7/27/2019 Enterprise Miner
60/63
Assessment ToolWhen you double click on the node you will see the following:
Tool Root ASE Root ASE 2
Tree 4.457445 19.86881593
Regresion 4.421218 19.5471686
Neural Network 4.455325 19.84992086
7/27/2019 Enterprise Miner
61/63
Assessment ToolAs for lift charts, they are unavailable for this analysis
7/27/2019 Enterprise Miner
62/63
Done!
The intention was to illustrate theinterface, not assess the SASs Enterprise
Miner per se.
With more effort to fix the missing valuesproblems on input, better results cansurely be achieved.
With more experience, many of the falsesteps would not have occurred.
7/27/2019 Enterprise Miner
63/63
Looping and Control
SASs biggest deficiency is the lack of
looping and control structures.
This affects all of SAS, not just Enterprise
Miner.
Any data manipulation, such as fixingmissing values, must be done by hand,
one variable at a time. R has a huge advantage here!