Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
INTRODUCTION TO DATA AND TEXT MININGANDREW PEASE, 8 MARCH 2013
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
SAS® ANALYTICS
Operations Research
Quality Improvement
Data Visualizatio
n
Model Management
Discrete Event Simulation
Analysis of Means
Cluster Analysis
Matrix Programming
Spectral Analysis
Ensemble Models
Sample Size Computations
Simulation
Categorical Data Analysis
Psychometric Analysis
Genetic Algorithms
Survival Analysis
Statistical Process Control
X11 & X12 Models
Decision Trees
Analysis of VarianceSurvey Data Analysis
Vector Autoregressive Models
Nonlinear Programming Network Flow
Models
Nonparametric Analysis
Content Categorization
Study Planning
ARIMA Models
Linear Programming
Interior-Point Models
Scheduling
Bayesian
R Integration
Multivariate Analysis
Neural Networks
Gradient Boosting Machines
Automated Scoring
Exploratory Data Analysis
Random Forrests
Mixed Models
Design of Experiments
Predictive Modeling
Information Theory
Reliability Analysis
Constraint Programming
Discrete Event Simulation
Social Network Analysis
Ontology Management
Regression
Process Capability Analysis
Descriptive Modeling
Mixed-Integer Programming
Fractional Factorial
D-Optimal
Association & Sequence Analysis
Multinomical Discrete Choice
High Performance ForecastingText
AnalyticsContent
CategorizationOntology Management
Sentiment Analysis
Forecasting
Econometrics
Large-Scale Forecasting
Time Series Analysis
Data Mining
Scoring Acceleration
Predictive Analytics
Statistics
Statistical Analysis
Interactive Matrix Programming
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
DATA MINING IS:
Discovering patterns, trends and relationships represented in data
Developing models to understand and describe characteristics and activity based on these patterns
Use insights to help evaluate future options and take fact-based decisions
Deploy scores and results for timely, appropriate action
time….
…. Past Future ….
Observed Events Predicted Events
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
INDUSTRY SPECIFIC DATA MINING APPLICATIONSApplication What is Predicted? Driven Business Decision
Credit Scoring (Banking)
Measure credit worthiness of new and existing set of customers
How to assess and control risk within existing (or new) consumer portfolios?
Market Basket Analysis (Retail)
Which products are likely to purchased together?
How to increase sales with cross-sell/up-sell, loyalty programs, promotions?
Asset Maintenance (Utilities, Mfg., Oil & Gas)
Identify real drivers of asset or equipment failure
How to minimize operational disruptions and maintenance costs?
Health & Condition Mgmt. (Health Insurance)
Identify patients at risk of a chronic illness & offer treatment program
How can we reduce healthcare costs and satisfy patients?
Fraud Mgmt. (Govt., Insurance, Banks)
Detect unknown fraud cases and future risks
How to decrease fraud losses and lower false positives?
Drug Discovery (Life Science)
Find compounds that have desirable effects & detect drug behavior during trials
How to bring drugs quickly and effectively to the marketplace?
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
DATA MINING METHODOLOGYSEMMA
Sample
Explore
ModifyModel Assess Score
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
G T E W V G H U I B C X A Q W E T D F G J K O I U T C M N X H G A L O J U T Q A Z C F T E R T N J H Y U O P H Y R M W S D F M N B V H J U Y T I P Q A P G F S D W V B U I N S W B C Z A L K J T M A P I O I U X F E W I Y N H K D N Q U P Q P S F T E M X T R G E O
G T E W V G H U I B C X A Q W E T D F G J K O I U T C M N X H G A L O J U T Q A Z C F T E R T N J H Y U O P H Y R M W S D F M N B V H J U Y T I P Q A P G F S D W V B U I N S W B C Z A L K J T M A P I O I U X F E W I Y N H K D N Q U P Q P S F T E M X T R G E O
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
Content Categorization Text MiningSentiment
AnalysisOntology
Management
SAS TEXT ANALYTICS:
UNCOVERING THE TECHNOLOGY
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
•“If data is wrong, the basis for decision making is also faulty. Therefore, the Clinically Correct Time-True Registration system makes sense even beyond our department and hospital.”
- Sten Larsen, Chief Surgeon
•Creation of database to improving clinical work in research and diagnosis
LILLEBAELT HOSPITAL (Denmark)
HEAL
THCA
RE•Reduce error in patient records
•Reduce manual effort of patient record audits
BUSINESS ISSUE RESULTS
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
•"By decoding the 'messages' through statistical and root-cause analyses of complaints data, the government can better understand the voice of the people, and help government departments improve service delivery, make informed decisions and develop smart strategies. This in turn helps boost public satisfaction with the government, and build a quality city.”- Efficiency Unit’s Assistant Director, W.
F. Yuk
1823 HONG KONG EFFICIENCY UNIT
PUBL
IC
•1823 operates round-the-clock, including during Sundays and public holidays.
•Answers 2.65 million calls and 98.000 e-mails, including inquiries, suggestions and complaints
•Developed a Compliant Intelligence System that uncovers the trends, patterns and relationships inherent in the complaints
BUSINESS ISSUE RESULTS
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved.
DATA/TEXT MINING RESEARCH CONSIDERATIONS
• Data Mining for patent research/control
• Copyright research/control• Metadata-driven approach avoids
‘permanent’ data duplication• Analyst needs ‘creative freedom’
in combining, transforming data• User interfaces – programming
vs point-and-click• Cost to implement highly variable• Future Indications
• In-Memory• Big Data• Cloud Com
Copyr igh t © 2012, SAS Ins t i tute Inc . A l l r i gh ts r es erved. www.SAS.com