Welcome toSAS New South Wales User Group
Q3 2014
Committee:Scott Bass (Chair)
Peter StaggBhupendra Pant
James Enoch (SAS)Marna Smit (SAS)
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SNUG Q3 TIPS AND TRICKS
ACCESSING SAS, FREE LEARNING, ENGAGING THE COMMUNITY
JAMES ENOCH
EDUCATION MANAGER
SAS AUSTRALIA A/NZ
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ACCESS
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ACCESS SAS UNIVERSITY EDITION – THE KEY DETAILS
SAS University Edition:
• Free, for non commercial learning purposes
• For Teachers, Researchers and Students, as well as Adult Learners
• Aimed at assisting to bridge the analytical skills gap
• Small download to local workstation (PC, Mac or Linux)
• Runs off virtualization software and browser (no internet access required)
• More than 50,000 downloads
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ACCESS SAS UNIVERSITY EDITION – WHAT IS INCLUDED
Features:
• An intuitive interface that lets you interact with the software from
your PC, Mac or Linux workstation - SAS Studio
• A powerful programming language that’s easy to learn, easy to
use - Base SAS
• Comprehensive, reliable tools that include state-of-the-art
statistical methods - SAS/STAT
• A robust, yet flexible matrix programming language for more in-
depth, specialised analysis and exploration - SAS/IML
• Out-of-the-box access to PC file formats for a simplified approach
to accessing data - SAS/ACCESS
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
Download – Search for ‘SAS University Edition’ or
http://www.sas.com/en_us/software/university-edition/download-software.html
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ACCESS WHAT IS SAS STUDIO
SAS Studio: What is it?
• Web based Programming interface to SAS. Runs in browser, no need to
install anything, when connecting to a remote SAS session.
• HTML5 based application, no browser plugins needed – thus it runs on
Windows, Macs, iPads and more.
• Basis for new offerings from SAS, such as the University Edition
Resources:
• Blog – Chris Hemedinger – Cage Match: SAS Studio vs SAS Enterprise
Guide http://blogs.sas.com/content/sasdummy/2014/05/30/sas-studio-and-eg/
• Getting Started Tutorials
• SAS Tech Talk: SAS Studio
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
LEARN
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
Free eLearning – www.sas.com/au/training
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
LEARN ONLINE TUTORIALS
Free Tutorials - http://support.sas.com/training/tutorial/
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
LEARN CURRICULUM PATHWAYS
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ENGAGE
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ENGAGE WORK PLACEMENT PROGRAM
Complimentary service
One-to-one match
Online form
Flexible placements
Bolster recruitment strategies
Help shape the professionals of tomorrow
Email: [email protected] Website: http://www.sas.com/australia/academic
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
Thank you!
Westpac Banking Corporation ABN 33 007 457 141.
SPATIAL SMOOTHING USING SASJohn Connor – Enterprise Risk Analytics, Group Risk
August 2014
This slide is for video use only.
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SNUG Q3
Agenda
24
What are choropleth maps?
Uses of choropleth maps
How to generate choropleth maps using SAS
How to spatially smooth geographic data
Case study
Questions
SNUG Q3
Choropleth maps
25
What are they?
“A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the
measurement of the statistical variable being displayed on the map, such as population density or per-
capita income.
The choropleth map provides an easy way to visualize how a measurement varies across a geographic
area or it shows the level of variability within a region.”
Benefits:
Visually understand the problem
Identify key regions of risk or opportunity
Uses:
Risk management
Strategy
Examples:
Marketing campaigns
Crime management
Identifying concentration risk
Economic modelling
SNUG Q3
Case study – Company A
26
Hypothesis
Research suggests that Company A is more likely to sell its product to people from a lower
socio-economic background.
Problem
Company A wants to increase its sales by advertising its product in the Sydney area to lower
socio-economic regions. What regions should Company A target?
Solution
After the spatial smoothing of the data, identify key regions using choropleth maps to
maximise the benefit of the marketing campaign.
Collect map data
Spatial Smoothing
Plot mapsIdentify regions
Devise strategy
SNUG Q3
Collect map data
27
3 files are needed to create maps in SAS: a SHP (Main) file, a SHX (Index) file and a DBF (dBase) file
Data sources
SAS Software - SAS/Graph® comes with a large library of map data sets (Libname MAPSSAS)
Australia Bureau of Statistics - ABS has an extensive number of map data sets:
Census Geographic Areas Digital Boundaries - State Suburbs, Postal Areas, Commonwealth
Electoral Divisions, State Electoral Divisions, Indigenous Regions, Indigenous Areas andIndigenous Locations
Australian Standard Geographical Classification (ASGC) Digital Boundaries - Statistical Local
Areas, Statistical Subdivisions, State and Territories, Statistical Districts, Local Government
Areas, Statistical Region Sectors and Major Statistical Regions
proc mapimport datafile=“Z:\Suburb_Shape_File.shp“
out=MAPDATA;
run;
SNUG Q3
Plotting the Maps - Unsmoothed
28
Proc gmap can be used to plot the map:
Tips:
- ODS PDF allows the images to be easily copied to reports
- Use proc rank to allow continuous variables to be ranked into percentiles
- Select a mapping variable appropriate to the region of analysis
- Ensure that the ID variable in the data file is in the same format as the map file
ods pdf file= 'Z:\file_name.pdf' ;
proc gmap map=MAPDATA
data=INPUTDATA all;
id Postcode;
choro SOCIAL_ECONOMIC_INDEX_SCORE;
run;
ods pdf close;
SNUG Q3
Unsmoothed map
29
Postcodes in red indicate a very high of probability of uptake of the product
Without spatially smoothing the data, it is difficult to identify target areas due to noise in the granular
data
No clear marketing strategy
SNUG Q3
Spatial smoothing –A step by step guide
30
Step 1 – Find the mid-point of each postcode shape
A shape file is a list of XY co-ordinates for the perimeter of the shape
SAS has pre-installed macros which makes it easy to find the mid-point of each shape:
Tips:
- Address data can be geo-coded to get XY co-ordinates using google, yahoo, etc..
- You can use _N_ to generate a unique ID for each postcode
%annomac;
%centroid(INPUT_DATA,OUTPUT_DATA,POSTCODE);
SNUG Q3
Spatial smoothing –A step by step guide
31
Step 2 – Generate a list of every combination of postcodes
Step 3 – Left join on the XY co-ordinates for the centroids by postcode and by
postcode_nearby
Tips:
- You can use _N_ to generate the macro variable &nobs_data.
- Combine using two left joins by postcode and then by postcode_nearby
DATA blank;
postcode= 1;
postcode_nearby = 1;
DO postcode= 1 TO &nobs_Data.;
postcode =postcode;
do postcode_nearby = 1 to &nobs_Data.;
postcode_nearby = postcode_nearby;
OUTPUT;
END;
end;
RUN;
SNUG Q3
Spatial smoothing –A step by step guide
32
Step 4 – Calculate the distance between each postcode and every other postcode for
all combinations
The great circle distance is the shortest distance on the surface of a sphere
The spherical law of cosines can be used to find the arc length:
This method calculates the great circle distance (based on spherical trigonometry) and
assumes that :
- 1 minute of arc = 1 nautical mile
- 1 nautical mile = 1.852 km
Tip:
An approximation of the mid-point of small shape can be found using a proc
means statement
Distance =
1.852 * 60 * Arcos(sin(X_postcode)*sin(X_postcode)+cos(X_postcode)*
cos(X_postcode_nearby)*cos(SUM(Y_postcode-Y_postcode_nearby)));
SNUG Q3
Spatial smoothing –A step by step guide
33
Spatial smoothing is more of an art than a science
Requires a function to determine how surrounding regions influence an observation
The function should be relevant to the problem
Tip: Caution must be taken when there are natural boundaries (lakes, rivers, etc.)
SNUG Q3
Spatial smoothing –A step by step guide
34
Step 5: Generate a spatial smoothing function
Too much spatial smoothing will produce
meaningless results
Too little spatial smoothing will not reduce the
noise in the data
/*weight function*/
data output ;
set input;
if distance GT 10 then weight = 0; /*No weight if >10km*/
else weight =(distance - 10)**2/ (10**2);
run;
Tip: Weighting by population as well as distance will give more stable results
SNUG Q3
Case study results
35
Before spatial smoothing, it was difficult to identify areas due to noise in the granular data
After spatial smoothing, target areas are clearly identifiable
Regions in red indicate a very high probability of uptake of the product
Unsmoothed Smoothed
Significant noise in the data Target areas are more easily identifiable – 4 regions
SNUG Q3
Summary
36
Choropleth maps can be a useful visual tool for developing strategy and managing risk
Spatial smoothing can reduce noise in the data to identify target regions
There are 5 easy steps:
- Step 1: Collect shape files
- Step 2: Spatial smoothing
- Step 3: Plot maps
- Step 4: Identify regions
- Step 5 :Devise strategy
SNUG Q3
Useful links
37
Choropleth maps using SAS:
http://support.sas.com/documentation/cdl/en/graphref/63022/HTML/default/gmpchoro-ex.htm
SAS maps online:
http://support.sas.com/rnd/datavisualization/mapsonline/index.html
ABS Standard Boundaries:
http://www.abs.gov.au/AUSSTATS/[email protected]/DetailsPage/1259.0.30.001July%202011?OpenDocument
ABS Census Boundaries:
http://www.abs.gov.au/AUSSTATS/[email protected]/DetailsPage/2923.0.30.0012006
Distance Calculator Algorithms:
http://www.ga.gov.au/scientific-topics/positioning-navigation/geodesy/geodetic-techniques/distance-
calculation-algorithms
Geo-coding:
http://en.wikipedia.org/wiki/List_of_geocoding_systems
SNUG Q3 38
QUESTIONS?
Star Schemas
Patrick Cuba – Consultant
(SAS® Software)
Scalable Performance Data Engine
using
• Case Study – Need for SPDE• SPDE Library • Case Study – Need for SPDS• SPDS Server
Clusters Star Schema StarJoin
• Questions• References
42
• Table build is 6 hours• Query time is 20 minutes
• Latest is 360GB• Generation tables hold 24 months• Generation tables grown to 1TB each
• 300+ columns• Four balances per credit card (Max 255)• 20 million customers• Growing customer base• Keeps defaults customer balance
43
• At month end the cycle end and latest credit card for the month are added to SAS Generation Tables
Cycle-endMonth EndCycle-endCycle-end
Cycle-end
Cycle-end
Cycle-end
Month end
Month end
Month end
• Accounts cycle at different days in the month
Latest
44
SAS Dataset
• SAS Datasets are flat files
Page 45
libname all_users’/disk1/metadata’;
• Under BASE SAS License• Scalable Performance Data Engine (SPDE)• On SMP server (at least 2 CPU’s)• RAID
SAS SPD Dataset
Data Part
Data Part
Data Part
Data Part
Data Part
HBX Index
IBX Index
Meta
libname all_users spde ’/disk1/metadata’datapath= (’/disk2/userdata’ ’/disk3/userdata’)indexpath= (’/disk4/userindexes’ ’/disk5/userindexes’) partsize=128M;
46
• Star Schema using StarJoin• Clustered Cycle & Month end
totalling 1TB
• Table build is 30-40 minutes• Query time is seconds to 5
minutes
47
Dimension
DimensionFact
Dimension
Dimension
• Scalable Performance Data Server• Client/Server• SQL Pass-thru
48
• Clusters
M1
M2
M3
M4
M5
M6
M7
M8
Cluster
PROC SPDO LIBRARY=domain-name;SET ACLUSER user-name;CLUSTER CREATE cluster-table-nameMEM = SPD-Server-table1MEM = SPD-Server-table2MAXSLOT=24
QUIT;
49
• Facts and Dimensions
Dimension
DimensionFact
Dimension
Dimension
Pairwise :7 Joins1 Select
StarJoin:3 Steps
50
execute(reset nostarjoin=<1/0>)
Page 51
• 1. Turn it
Page 52
• 2. No
Dim
DimFact
Dim
Dim
Dim
Dim
Page 53
• 3. Single
Dim
DimFact
Dim
Dim
• 4. Single
Fact
• 5. Fact & Dimension
Page 55
STARJOINhttp://support.sas.com/documentation/cdl/en/spdsug/63088/HTML/default/viewer.htm#n0mlj75x9c4dtzn1ves84e1op3jt.htmSAS® 9.1 Scalable PerformanceData Enginehttp://support.sas.com/documentation/onlinedoc/91pdf/sasdoc_91/base_dataeng_6996.pdfSAS® 9.2Scalable PerformanceData Enginehttp://support.sas.com/documentation/cdl/en/engspde/61887/PDF/default/engspde.pdfWhen should you use the SPDE enginehttp://support.sas.com/rnd/scalability/spde/when.html
Wrap Up
• Online Survey – Please complete• Best Presentation Award
• Q2 – Ron Elazar & Dharmik Jeena from Westpac• Wins iPod Nano & goes into draw for trip to SAS Global Forum 2015
• SANZOC – SAS Australia & New Zealand Online Community• SNUG Committee• Attendance – No Shows• Lucky Draw