Business Analytics and Optimization Introduction

Post on 22-Nov-2014

310 views 0 download

description

Introduction to business analytics and optimization - History, concepts, data science skills, basic stats, modelling...

transcript

Business Analytics and Optimization: A Technical Introduction

Oleksandr Romanko, Ph.D. Senior Research Analyst, Risk Analytics – Business Analytics, IBM Adjunct Professor, University of Toronto

Toronto SMAC Meetup September 18, 2014

© 2014 IBM Corporation 2

Making the world work better – pioneering the science

2008

1973 1969

1981

© 2014 IBM Corporation 3

IBM Centennial: 100 Years of Innovation

© 2014 IBM Corporation

Analytics Jobs

Created by: Dennis Buttera

© 2014 IBM Corporation

Data science

7

© 2014 IBM Corporation

Business Analytics

© 2014 IBM Corporation

Predictive Analytics What will happen?

Descriptive Analytics What has happened?

Prescriptive Analytics What should we do?

What is analytics?

Data Insight Action

Decide Analyze

Business Value

9

Analytics is the scientific process of deriving insights from

data in order to make decisions

Business analytics and optimization (video)

© 2014 IBM Corporation 11

IBM Business Analytics portfolio

IBM Business Analytics

Financial Services Public Sector Distribution Industrial Communications

Risk Customer

Industry

Solutions

Finance Operations

Risk Analytics

Business Intelligence

Software

Categories Predictive Analytics

Performance Management

Functional

Solutions

Core

Capabilities

REPORT MODEL PREDICT COLLABORATE

Budgeting & Forecasting

Financial Consolidation

Disclosure Management

Risk Identification

Risk & Control Assessment

Resource Optimization

Social Media Analytics

Profitability Modeling &

Optimization

Production Planning

Asset Management

Customer Acquisition

Customer Lifetime

Value

Customer Loyalty

& Retention

Risk Mitigation Planning

Risk Aware Decisioning

Sales Performance Management

PLAN ANALYZE

Visualize Discover

Forecast Mine

Govern

Decide Score

Simulate Contribute

Survey

Decision Management

© 2014 IBM Corporation 12

Operations research

Operations Research (O.R.) is the discipline of applying advanced analytical methods to help make better decisions

Analytical techniques:

Simulation – giving you the ability to try out approaches and test

ideas for improvement

Optimization – narrowing your choices to the very best when there

are virtually innumerable feasible options and comparing them is

difficult

Probability and Statistics – helping you

measure risk, mine data to find

valuable connections and insights,

test conclusions, and make

reliable forecasts

Mathematical Modeling – algorithms

and software

© 2014 IBM Corporation 13

Our planet is a complex, dynamic, highly interconnected $54 Trillion system-of-systems (OECD-based analysis)

Communication $ 3.96 Tn

Transportation $ 6.95 Tn

Leisure / Recreation /

Clothing $ 7.80 Tn

Healthcare $ 4.27 Tn

Food $ 4.89 Tn

Infrastructure $ 12.54 Tn

Govt. & Safety $ 5.21 Tn

Finance $ 4.58 Tn

Electricity $ 2.94 Tn

Education $ 1.36 Tn

Water $ 0.13 Tn

Global system-of-systems

$54 Trillion (100% of WW 2008 GDP)

Same Industry

Business Support

IT Systems

Energy Resources

Machinery

Materials

Trade

Legend for system inputs

Note:

1. Size of bubbles represents systems’ economic values

2. Arrows represent the strength of systems’ interaction

Source: IBV analysis based on OECD

This chart shows ‘systems‘ (not ‘industries‘)

1 Tn

© 2014 IBM Corporation 14

Economists estimate, that all systems carry inefficiencies of up to $15 Tn, of which $4 Tn could be eliminated

Global economic value of

System-of-systems

$54 Trillion 100% of WW 2008 GDP

Inefficiencies $15 Trillion 28% of WW 2008 GDP

Improvement potential

$4 Trillion 7% of WW 2008 GDP

How to read the chart:

For example, the Healthcare system‘s

value is $4,270B. It carries an estimated

inefficiency of 42%. From that level of 42%

inefficiency, economists estimate that

~34% can be eliminated (= 34% x 42%).

Source: IBM economists survey 2009; n= 480

System inefficiency as % of total economic value

Impro

vem

ent

pote

ntial as

% o

f syste

m ineffic

iency

Education

1,360

Building & Transport

Infrastructure

12,540

Healthcare

4,270

Government & Safety

5,210

Electricity

2,940

Financial

4,580

Food & Water

4,890

Transportation (Goods

& Passenger)

6,950

Leisure / Recreation

/ Clothing

7,800

Communication

3,960

Analysis of inefficiencies in the

planet‘s system-of-systems

Note: Size of the bubble indicate absolute value of the system in USD Billions

42%

34%

This chart shows ‘systems‘ (not ‘industries‘)

15%

20%

25%

30%

35%

40%

15% 20% 25% 30% 35% 40% 45%

© 2014 IBM Corporation 15

History of analytics

© 2014 IBM Corporation 16

History of business analytics

© 2014 IBM Corporation

Business Analytics Examples

© 2014 IBM Corporation

Pit stop analytics

7

Calculations showed that time spent changing tires and refilling the tank was more than offset by the improved performance of the car on the track. 1. Softer tires stuck to the track better during turns than their harder cousins,

though they wore out more quickly. 2. Less gas in the tank translated into a lighter, and therefore faster, car.

Optimized F1 pit teams can change four tires in two seconds

© 2014 IBM Corporation 19

Movies

© 2014 IBM Corporation 20

Smarter Cities

© 2014 IBM Corporation 21

We can collect information from almost everything to make better decisions

Camera phones in

existence able to

document accidents,

damage, and crimes

1 billion RFID tags

embedded into our

world and across

entire ecosystems

30 billion Of new automobiles

will contain event data

recorders collecting

travel information

85%

Instrumented Interconnected Intelligent

© 2014 IBM Corporation 22

Big data

Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing.

Source: Wikipedia

© 2014 IBM Corporation 23

Big social data

© 2014 IBM Corporation 24

Applications of big data analytics

Homeland Security

Finance Smarter Healthcare Multi-channel sales

Telecom

Manufacturing

Traffic Control

Trading Analytics Fraud and Risk

Log Analysis

Search Quality

Retail: Churn, NBO

© 2014 IBM Corporation 25

Police use analytics to reduce crime (video)

© 2014 IBM Corporation 26

Marketing and supply chain analytics (video)

© 2014 IBM Corporation 27

Marketing analytics

© 2014 IBM Corporation 28

Intelligent transport systems

Real time monitoring & forecasting of congestion in cities enables real time action to

reduce traffic and emissions – Can charge drivers at point of use for access to city centers

Stockholm Congestion Tax Project – Involves 18 barrier-free control points

– Allows differentiated pricing by time of day, congestion level, and potentially emissions level

– Results:

• Traffic reduced by 100,000 vehicle passages per day (25%)

• Public transportation passengers increased by 40,000 / day

• Congestion during peak hours and CO2 emissions were dramatically reduced

© 2014 IBM Corporation 29

Analytics for green vehicles and technology (video)

© 2014 IBM Corporation 30

Artificial intelligence

Source: A Brief Overview and Thoughts for Healthcare Education and Performance Improvement by Watson Team

© 2014 IBM Corporation 31

Artificial intelligence

Source: A Brief Overview and Thoughts for Healthcare Education and Performance Improvement by Watson Team

On 27th May 1498, Vasco da Gama landed in Kappad Beach

On 27th May 1498, Vasco da Gama landed in Kappad Beach

celebrated

May 1898 400th anniversary

arrival in

In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.

Portugal

landed in

27th May 1498

Vasco da Gama

Temporal Reasoning

Statistical Paraphrasing

GeoSpatial Reasoning

explorer

On 27th May 1498, Vasco da Gama landed in Kappad Beach

On the 27th of May 1498, Vasco da Gama landed in Kappad Beach

Kappad Beach

Para-phrases

Geo-KB

Date Math

India

Search Far and Wide Explore many hypotheses

Find Judge Evidence

Many inference algorithms

© 2014 IBM Corporation 32

Artificial intelligence (video)

© 2014 IBM Corporation 33

Watson Analytics

© 2014 IBM Corporation 34

Watson Analytics

© 2014 IBM Corporation

Cloud

© 2014 IBM Corporation 36

Bluemix

www.bluemix.net

© 2014 IBM Corporation 37

Bluemix

© 2014 IBM Corporation

Business Analytics Education

© 2014 IBM Corporation

IBM Academic Initiative program

Cognos SPSS ILOG

© 2014 IBM Corporation

Master of Business Analytics programs – top 20 universities

© 2014 IBM Corporation

Industry support for Master of Business Analytics programs

© 2014 IBM Corporation

Business Analytics programs – curriculum

Applied Statistics and Probability

Fundamentals of Computational Mathematics

Data Mining and Knowledge Discovery

Simulation Modelling

Optimization

Financial Decision Making

Computational Methods for Business Data Analysis

Computational Finance and Risk Management

Visual Analytics and Knowledge Representation

Mathematical Modelling for Business

Machine Learning, Cognitive Computing and Artificial Intelligence

Marketing Analytics

Strategies for Managing Innovations

Analytics of Web, Social Networks and Business News

© 2014 IBM Corporation

Applied Statistics

© 2014 IBM Corporation

What kind of data are we dealing with?

Types of data

• Quantitative

• Categorical (ordered, unordered)

Data collection

• Independent observations (one observation per subject)

• Dependent observations (repeated observation of the same subject, relationships

within groups, relationships over time or space)

Type of data drives the direction of your analysis

• How to plot

• How to summarize

• How to draw inferences and conclusions

• How to issue predictions

44

© 2014 IBM Corporation

Quantitative data

Examples: temperature, age, income

Quick check: “Does it makes sense to calculate an average?”

Appropriate summary statistics:

– Mean and Median

– Standard Deviation

– Percentiles

More advanced predictive methods: Regression, Time Series Analysis, …

Plot your data!

45

© 2014 IBM Corporation

Summarizing quantitative data

One-number summaries

– Mean

Average, obtained by summing all observations and dividing by the number of obs.

– Median

The center value, below and above which you will find 50% of the observations.

Summarizing your data with one number may not tell the whole story:

46

Median = 19.8 Median = 19.8 Median = 10.5

© 2014 IBM Corporation 47

Flaw of averages

“Plans based on average assumptions are wrong on average”

Average depth 3 ft

© 2014 IBM Corporation

“Most observations fall within ±2 standard deviations of the mean.”

Standard deviation

48

If the data is normally distributed

95 % of observations

Standard Deviation = 4.2

~95% of observations between 11.4 and 28.2

© 2014 IBM Corporation

Descriptive statistics - example

Random sample of 5000 customers of a credit card company

49

Amount spent on

primary card last

month

Debt to income

ratio (x100)

N Valid 5000 5000

Missing 0 0

Mean 1683.7340 9.9578

Median 1690.0670 8.8000

Std. Deviation 210.26680 6.42317

Minimum .00 .00

Maximum 2482.72 43.10

© 2014 IBM Corporation

Percentiles

Generalizations of the median (50th percentile).

The pth is the data point below which p percent of the observations fall.

Often used to compare a single observation to a general population.

Examples:

– Standardized test scores

If you scored in the 93th percentile, your score was higher than that of 93% of test

takers.

– Child growth percentiles

50

© 2014 IBM Corporation

Percentiles - example

Percentiles can be another way of describing how spread out data values are.

Example: 5-Number Summary

Minimum – 25th percentile – Median – 50th percentile - Maximum

51

Amount spent on

primary card last

month

Debt to income

ratio (x100)

Minimum .00 .00

Percentiles

25 1567.4658 5.1250

50 1690.0670 8.8000

75 1814.5430 13.5000

Maximum 2482.72 43.10

© 2014 IBM Corporation

Distributions: Normal distribution

52

© 2014 IBM Corporation

Distributions

53

© 2014 IBM Corporation 54

Distributions

Estimate of the probability distribution of global mean temperature resulting

from a doubling of CO2 relative to its pre-industrial value, made from

100000 simulations

© 2014 IBM Corporation 55

© 2014 IBM Corporation 56

Questions?