Modelling for decisions

Modelling for Decisions

Using Monte Carlo simulation, Bayesian inference and a lot of common sense

A quick introduction

Photo credits at www.coppelia.io/photo-credits/

Who is this person?

Simon Raper

Founder of data sciences service company called COPPELIA

Started coding when I was 8 on a ZX-81

Then abandoned the sciences until I was 25! And was shocked

But I was really luckyDot com boom gave me a crash course in IT (allowed to do ANYTHING!)

Did machine learning not financial engineering!

Lots of business experience, especially in media(Channel 4, ITV, News UK, McDonalds, Unilever, AOL, Credit-Suisse, Jaguar, Sainsbury’s)

3

Areas of Expertise

classical statistics(R, SPSS, SAS, matlab)

bayesian statistics(R, winbugs)

simulation(agent-based, system dynamics)

big data(aws, hadoop, hive, spark, mahout, mongodb)

machine learning(R, mahout, mllib)

coding(R, python, java, sql, javascript, d3)

4

Some past projects Machine4 at Channel 4

The Content Universe at Channel 4

Market Simulation at mindshare

Bayesian and mixed effects modelling at mindshare

Drunks and Lampposts

5

Some of the things we will be looking at today

● How to build the right model to answer a question and quickly!● Picking the right function for the job● Some unexpected ways to use statistical techniques● Understanding the limitations of your model● Taking it further

○ Using simulation to understand its dynamics○ Using Monte Carlo simulation to understand the impact of

uncertainty in the inputs○ Using Bayesian inference to see how the data and the model

impact current beliefs

6

To begin with a controversial statement!

The majority of statistical models used in business are either unnecessary or used inappropriately.

There’s a reluctance to ask why a statistical model is needed and whether it is worth the effort of development.

In many cases we would be better served by clear thinking about a specific problem (how the data relates to the business decision) resorting to statistical modelling (as opposed to plain old fashioned mathematical modelling) only where the benefits are obvious.

7

So what does make a good model?

A good model in this sense has the following virtues. (They might seem obvious but it is surprising how often they are forgotten!)

● It captures all the features of the world that are relevant to the decision and leaves out those which are not

● Its purpose is to relate the available data to the decision● It only uses statistical theory when the benefits outweigh the costs● It incorporates common sense assumptions● It incorporates uncertainty● Its inadequacies are understood and communicated to the decision maker

8

Some wisdom to keep in the back of your head

There is a quote attributed to John Tukey (himself a founding figure in statistics)

“An approximate answer to the right problem is worth a good deal more “than an exact answer to an approximate problem.”

And another very popular but always true (almost by definition) quote by George Box

“All models are wrong but some are useful”

9

Now for a real decision and some data

The decision: The CMO has to decide on next year's marketing budget. She would like to how much she should spend in total on product P.

The available data are:

● A time series of weekly sales for product P going back five years● A time series of weekly marketing spend for product P going back five years● Annual sales figures for P and its three main competitors going back five years● Annual marketing spend for P and its three main competitors going back five

years● Some research showing the demographic profile of buyers of product P and the

amount of switching there is in the category

10

What they never mention in the text books!

The work needs to be done in a day and there is only one person who can work on it. (Note the time and resource constraints have a huge impact on

the choice of approach)

11

The paranoid statistician’s checklist

● Is it representative? ● How well does it cover all the

possibilities?● Is it accurate?● Are there missing values?

12

Always start by looking at the data

13

The next move: add as much info as you can

Where can you find this information?

1. Common sense2. Questions to the decision maker (or anyone else who

understands the domain)3. Logical constraints

14

And list all your common sense assumptions (nothing is too obvious)

1. If you don't spend anything then there will be no uplift due to marketing spend!2. There's a threshold below which any spend will be effective. Obviously if I spend only £10 nothing is

going to happen (unless it's bribing a single customer!)3. There's an eventual limit to what marketing spend can do (it can't generate more sales than there

are people who can buy the product)4. It's likely that marketing spend will be most effective on those who are least loyal to a competitor

brand5. For business/political reasons there's a minimum and a maximum possible budget available6. The effectiveness of marketing spend will be constrained by the reach of our marketing channels7. The effectiveness of marketing spend will be determined by competitor spend8. There will be a default position which the decision maker resorts to in the absence of any

information from you (e.g. spend the same as last year)9. There's a whole load of other factors (creative, choice of channels, overall strategy) that will affect

the impact of the marketing spend

15

You can tame a problem by picking the right function

16

We have good reasons for picking this one

The problem is reduced to finding values for the parameters

Some barmat calculations for L:11.5 million men who would buy the product

product lasts 2 weeks

cost £1

max annual sales 26x11.5= 300 million

sales of all four brands are 290 million so 10 million headroom

90% are loyal buyers, 10% switch regularly

P has 50% of the market and so has 5% of the 10% but another 5% available.

0.05 x 290 + 0.62 x 10 = 21 million

only 15% reachable by media 21x0.15 = 3 million

17

Does this seem very very rough? Yes. But are taking note of that. Later we will look at how sensitive our results are to these assumptions.

The data should help us here but … an impasse: we don’t have the uplifts

Call in the econometricians for a 3 month project?

Are we really stuck though?

18

The solution is common sense and some nice tricks!

19

Yes it’s rough but it does the job: we can make decisions

20

And now the important thing is understanding how it

is wrong and what that means!

1. Competitors not dealt with2. Conditional on assumptions3. Confounding factors4. Scale of precision5. Not a statistical model

21

Nevertheless….

Another example using the logistic curve

A web start-up has just launched its new product. Customers pay per day to use the product so the number of customers can drop as well as rise over time. However word does seem to be spreading as the daily number of customers appears to be climbing

They want to know two things

1. When should they spend their marketing budget?2. For financial planning purposes they would like to know when the adoption curve will start

to level out. They have done their own market sizing work and they estimate that this will happen at about 4000 customers a day. At their most pessimistic they put it at 3000 and at the most optimistic they say 5000.

22

We can use the simulation to understand the impact of feedback loops

23

And we can use Monte Carlo simulation to explore the impact of uncertainty

A wide concept but in our case we are talking about using computer simulated random sampling to model the effect of uncertainty in the inputs to a system on the outputs of that system

1. Define inputs2. Generate inputs from probability distribution3. Perform computation on inputs4. Aggregate results

24

Finally we might be interested in what the data says about our assumptions

A Bayesian example: A wet umbrella

● Prior belief = Fairly certain it is not raining● Data = Man walks into the room with a wet umbrella● Model = Wet umbrellas highly improbable without rain● Posterior belief: Shifted to fairly certain it is raining

25

We can use Bayesian methods to understand how the data might update our beliefs about L

26

A quick recap

● How to build the right model to answer a question and quickly!● Picking the right function for the job● Some unexpected ways to use statistical techniques● Understanding the limitations of your model● Taking it further

○ Using simulation to understand its dynamics○ Using Monte Carlo simulation to understand the impact of

uncertainty in the inputs○ Using Bayesian inference to see how the data and the model

impact current beliefs

27

28

Thank you

If you’d like to know more talk to me at [email protected] me on twitter @coppeliamlaOr visit my blog www.coppelia.io/blog

mailto:[email protected]

Date post:	19-Feb-2017
Category:	Data & Analytics
Upload:	coppeliamla
View:	4,431 times
Download:	1 times

Modelling for decisions

Data & Analytics