Business Analytics and Optimization: A Technical Introduction
Oleksandr Romanko, Ph.D. Senior Research Analyst, Risk Analytics – Business Analytics, IBM Adjunct Professor, University of Toronto
Toronto SMAC Meetup September 18, 2014
© 2014 IBM Corporation 2
Making the world work better – pioneering the science
2008
1973 1969
1981
© 2014 IBM Corporation 3
IBM Centennial: 100 Years of Innovation
© 2014 IBM Corporation
Analytics Jobs
Created by: Dennis Buttera
© 2014 IBM Corporation
Data science
7
© 2014 IBM Corporation
Business Analytics
© 2014 IBM Corporation
Predictive Analytics What will happen?
Descriptive Analytics What has happened?
Prescriptive Analytics What should we do?
What is analytics?
Data Insight Action
Decide Analyze
Business Value
9
Analytics is the scientific process of deriving insights from
data in order to make decisions
Business analytics and optimization (video)
© 2014 IBM Corporation 11
IBM Business Analytics portfolio
IBM Business Analytics
Financial Services Public Sector Distribution Industrial Communications
Risk Customer
Industry
Solutions
Finance Operations
Risk Analytics
Business Intelligence
Software
Categories Predictive Analytics
Performance Management
Functional
Solutions
Core
Capabilities
REPORT MODEL PREDICT COLLABORATE
Budgeting & Forecasting
Financial Consolidation
Disclosure Management
Risk Identification
Risk & Control Assessment
Resource Optimization
Social Media Analytics
Profitability Modeling &
Optimization
Production Planning
Asset Management
Customer Acquisition
Customer Lifetime
Value
Customer Loyalty
& Retention
Risk Mitigation Planning
Risk Aware Decisioning
Sales Performance Management
PLAN ANALYZE
Visualize Discover
Forecast Mine
Govern
Decide Score
Simulate Contribute
Survey
Decision Management
© 2014 IBM Corporation 12
Operations research
Operations Research (O.R.) is the discipline of applying advanced analytical methods to help make better decisions
Analytical techniques:
Simulation – giving you the ability to try out approaches and test
ideas for improvement
Optimization – narrowing your choices to the very best when there
are virtually innumerable feasible options and comparing them is
difficult
Probability and Statistics – helping you
measure risk, mine data to find
valuable connections and insights,
test conclusions, and make
reliable forecasts
Mathematical Modeling – algorithms
and software
© 2014 IBM Corporation 13
Our planet is a complex, dynamic, highly interconnected $54 Trillion system-of-systems (OECD-based analysis)
Communication $ 3.96 Tn
Transportation $ 6.95 Tn
Leisure / Recreation /
Clothing $ 7.80 Tn
Healthcare $ 4.27 Tn
Food $ 4.89 Tn
Infrastructure $ 12.54 Tn
Govt. & Safety $ 5.21 Tn
Finance $ 4.58 Tn
Electricity $ 2.94 Tn
Education $ 1.36 Tn
Water $ 0.13 Tn
Global system-of-systems
$54 Trillion (100% of WW 2008 GDP)
Same Industry
Business Support
IT Systems
Energy Resources
Machinery
Materials
Trade
Legend for system inputs
Note:
1. Size of bubbles represents systems’ economic values
2. Arrows represent the strength of systems’ interaction
Source: IBV analysis based on OECD
This chart shows ‘systems‘ (not ‘industries‘)
1 Tn
© 2014 IBM Corporation 14
Economists estimate, that all systems carry inefficiencies of up to $15 Tn, of which $4 Tn could be eliminated
Global economic value of
System-of-systems
$54 Trillion 100% of WW 2008 GDP
Inefficiencies $15 Trillion 28% of WW 2008 GDP
Improvement potential
$4 Trillion 7% of WW 2008 GDP
How to read the chart:
For example, the Healthcare system‘s
value is $4,270B. It carries an estimated
inefficiency of 42%. From that level of 42%
inefficiency, economists estimate that
~34% can be eliminated (= 34% x 42%).
Source: IBM economists survey 2009; n= 480
System inefficiency as % of total economic value
Impro
vem
ent
pote
ntial as
% o
f syste
m ineffic
iency
Education
1,360
Building & Transport
Infrastructure
12,540
Healthcare
4,270
Government & Safety
5,210
Electricity
2,940
Financial
4,580
Food & Water
4,890
Transportation (Goods
& Passenger)
6,950
Leisure / Recreation
/ Clothing
7,800
Communication
3,960
Analysis of inefficiencies in the
planet‘s system-of-systems
Note: Size of the bubble indicate absolute value of the system in USD Billions
42%
34%
This chart shows ‘systems‘ (not ‘industries‘)
15%
20%
25%
30%
35%
40%
15% 20% 25% 30% 35% 40% 45%
© 2014 IBM Corporation 15
History of analytics
© 2014 IBM Corporation 16
History of business analytics
© 2014 IBM Corporation
Business Analytics Examples
© 2014 IBM Corporation
Pit stop analytics
7
Calculations showed that time spent changing tires and refilling the tank was more than offset by the improved performance of the car on the track. 1. Softer tires stuck to the track better during turns than their harder cousins,
though they wore out more quickly. 2. Less gas in the tank translated into a lighter, and therefore faster, car.
Optimized F1 pit teams can change four tires in two seconds
© 2014 IBM Corporation 19
Movies
© 2014 IBM Corporation 20
Smarter Cities
© 2014 IBM Corporation 21
We can collect information from almost everything to make better decisions
Camera phones in
existence able to
document accidents,
damage, and crimes
1 billion RFID tags
embedded into our
world and across
entire ecosystems
30 billion Of new automobiles
will contain event data
recorders collecting
travel information
85%
Instrumented Interconnected Intelligent
© 2014 IBM Corporation 22
Big data
Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing.
Source: Wikipedia
© 2014 IBM Corporation 23
Big social data
© 2014 IBM Corporation 24
Applications of big data analytics
Homeland Security
Finance Smarter Healthcare Multi-channel sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
© 2014 IBM Corporation 25
Police use analytics to reduce crime (video)
© 2014 IBM Corporation 26
Marketing and supply chain analytics (video)
© 2014 IBM Corporation 27
Marketing analytics
© 2014 IBM Corporation 28
Intelligent transport systems
Real time monitoring & forecasting of congestion in cities enables real time action to
reduce traffic and emissions – Can charge drivers at point of use for access to city centers
Stockholm Congestion Tax Project – Involves 18 barrier-free control points
– Allows differentiated pricing by time of day, congestion level, and potentially emissions level
– Results:
• Traffic reduced by 100,000 vehicle passages per day (25%)
• Public transportation passengers increased by 40,000 / day
• Congestion during peak hours and CO2 emissions were dramatically reduced
© 2014 IBM Corporation 29
Analytics for green vehicles and technology (video)
© 2014 IBM Corporation 30
Artificial intelligence
Source: A Brief Overview and Thoughts for Healthcare Education and Performance Improvement by Watson Team
© 2014 IBM Corporation 31
Artificial intelligence
Source: A Brief Overview and Thoughts for Healthcare Education and Performance Improvement by Watson Team
On 27th May 1498, Vasco da Gama landed in Kappad Beach
On 27th May 1498, Vasco da Gama landed in Kappad Beach
celebrated
May 1898 400th anniversary
arrival in
In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.
Portugal
landed in
27th May 1498
Vasco da Gama
Temporal Reasoning
Statistical Paraphrasing
GeoSpatial Reasoning
explorer
On 27th May 1498, Vasco da Gama landed in Kappad Beach
On the 27th of May 1498, Vasco da Gama landed in Kappad Beach
Kappad Beach
Para-phrases
Geo-KB
Date Math
India
Search Far and Wide Explore many hypotheses
Find Judge Evidence
Many inference algorithms
© 2014 IBM Corporation 32
Artificial intelligence (video)
© 2014 IBM Corporation 33
Watson Analytics
© 2014 IBM Corporation 34
Watson Analytics
© 2014 IBM Corporation
Cloud
© 2014 IBM Corporation 36
Bluemix
www.bluemix.net
© 2014 IBM Corporation 37
Bluemix
© 2014 IBM Corporation
Business Analytics Education
© 2014 IBM Corporation
IBM Academic Initiative program
Cognos SPSS ILOG
© 2014 IBM Corporation
Master of Business Analytics programs – top 20 universities
© 2014 IBM Corporation
Industry support for Master of Business Analytics programs
© 2014 IBM Corporation
Business Analytics programs – curriculum
Applied Statistics and Probability
Fundamentals of Computational Mathematics
Data Mining and Knowledge Discovery
Simulation Modelling
Optimization
Financial Decision Making
Computational Methods for Business Data Analysis
Computational Finance and Risk Management
Visual Analytics and Knowledge Representation
Mathematical Modelling for Business
Machine Learning, Cognitive Computing and Artificial Intelligence
Marketing Analytics
Strategies for Managing Innovations
Analytics of Web, Social Networks and Business News
© 2014 IBM Corporation
Applied Statistics
© 2014 IBM Corporation
What kind of data are we dealing with?
Types of data
• Quantitative
• Categorical (ordered, unordered)
Data collection
• Independent observations (one observation per subject)
• Dependent observations (repeated observation of the same subject, relationships
within groups, relationships over time or space)
Type of data drives the direction of your analysis
• How to plot
• How to summarize
• How to draw inferences and conclusions
• How to issue predictions
44
© 2014 IBM Corporation
Quantitative data
Examples: temperature, age, income
Quick check: “Does it makes sense to calculate an average?”
Appropriate summary statistics:
– Mean and Median
– Standard Deviation
– Percentiles
More advanced predictive methods: Regression, Time Series Analysis, …
Plot your data!
45
© 2014 IBM Corporation
Summarizing quantitative data
One-number summaries
– Mean
Average, obtained by summing all observations and dividing by the number of obs.
– Median
The center value, below and above which you will find 50% of the observations.
Summarizing your data with one number may not tell the whole story:
46
Median = 19.8 Median = 19.8 Median = 10.5
© 2014 IBM Corporation 47
Flaw of averages
“Plans based on average assumptions are wrong on average”
Average depth 3 ft
© 2014 IBM Corporation
“Most observations fall within ±2 standard deviations of the mean.”
Standard deviation
48
If the data is normally distributed
95 % of observations
Standard Deviation = 4.2
~95% of observations between 11.4 and 28.2
© 2014 IBM Corporation
Descriptive statistics - example
Random sample of 5000 customers of a credit card company
49
Amount spent on
primary card last
month
Debt to income
ratio (x100)
N Valid 5000 5000
Missing 0 0
Mean 1683.7340 9.9578
Median 1690.0670 8.8000
Std. Deviation 210.26680 6.42317
Minimum .00 .00
Maximum 2482.72 43.10
© 2014 IBM Corporation
Percentiles
Generalizations of the median (50th percentile).
The pth is the data point below which p percent of the observations fall.
Often used to compare a single observation to a general population.
Examples:
– Standardized test scores
If you scored in the 93th percentile, your score was higher than that of 93% of test
takers.
– Child growth percentiles
50
© 2014 IBM Corporation
Percentiles - example
Percentiles can be another way of describing how spread out data values are.
Example: 5-Number Summary
Minimum – 25th percentile – Median – 50th percentile - Maximum
51
Amount spent on
primary card last
month
Debt to income
ratio (x100)
Minimum .00 .00
Percentiles
25 1567.4658 5.1250
50 1690.0670 8.8000
75 1814.5430 13.5000
Maximum 2482.72 43.10
© 2014 IBM Corporation
Distributions: Normal distribution
52
© 2014 IBM Corporation
Distributions
53
© 2014 IBM Corporation 54
Distributions
Estimate of the probability distribution of global mean temperature resulting
from a doubling of CO2 relative to its pre-industrial value, made from
100000 simulations
© 2014 IBM Corporation 55
© 2014 IBM Corporation 56
Questions?