+ All Categories
Home > Documents > Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing...

Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing...

Date post: 16-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
55
Transcript
Page 1: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Reproducible research: assessing spatial predictionsof crime

Matthew Daws

Leeds

LIDA Seminar, November 2017

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 1 / 20

Page 2: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Project

UK Home O�ce Police Innovation Fund: \More with Less: Authentic

Implementation of Evidence-Based Predictive Patrol Plans". With

Andy Evans and Monsuru Adepeju here at Leeds.

My task:

Take crime prediction algorithms from the literature, and

implement in an open source way

(https://github.com/QuantCrimAtLeeds/PredictCode/)

Allow other researchers to see what bene�t di�erent crime

prediction algorithms are likely to give.

My background is in Mathematics; and Software Development.

Runs until February 2018.

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 2 / 20

Page 3: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Project

UK Home O�ce Police Innovation Fund: \More with Less: Authentic

Implementation of Evidence-Based Predictive Patrol Plans". With

Andy Evans and Monsuru Adepeju here at Leeds.

My task:

Take crime prediction algorithms from the literature, and

implement in an open source way

(https://github.com/QuantCrimAtLeeds/PredictCode/)

Allow other researchers to see what bene�t di�erent crime

prediction algorithms are likely to give.

My background is in Mathematics; and Software Development.

Runs until February 2018.

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 2 / 20

Page 4: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Project

UK Home O�ce Police Innovation Fund: \More with Less: Authentic

Implementation of Evidence-Based Predictive Patrol Plans". With

Andy Evans and Monsuru Adepeju here at Leeds.

My task:

Take crime prediction algorithms from the literature, and

implement in an open source way

(https://github.com/QuantCrimAtLeeds/PredictCode/)

Allow other researchers to see what bene�t di�erent crime

prediction algorithms are likely to give.

My background is in Mathematics; and Software Development.

Runs until February 2018.

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 2 / 20

Page 5: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Project

UK Home O�ce Police Innovation Fund: \More with Less: Authentic

Implementation of Evidence-Based Predictive Patrol Plans". With

Andy Evans and Monsuru Adepeju here at Leeds.

My task:

Take crime prediction algorithms from the literature, and

implement in an open source way

(https://github.com/QuantCrimAtLeeds/PredictCode/)

Allow other researchers to see what bene�t di�erent crime

prediction algorithms are likely to give.

My background is in Mathematics; and Software Development.

Runs until February 2018.

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 2 / 20

Page 6: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The (Near-)repeat hypothesis

\The tendency of victims of crime to, in the nearby future, be repeat

victims; and of near-by (say) buildings to also be future victims."

(Principally interested in Burglary.)

That is, a crime event at a spatial/temporal location tends to imply a

higher risk, localised in space and time, for nearby locations.

Classical prediction techniques tend to generate \hot spots"

around previous locations.

Part I: How do we do this? (Plea for reproducible research.)

Part II: And what do we mean by \prediction" anyway? What

makes a \good" prediciton?

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 3 / 20

Page 7: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The (Near-)repeat hypothesis

\The tendency of victims of crime to, in the nearby future, be repeat

victims; and of near-by (say) buildings to also be future victims."

(Principally interested in Burglary.)

That is, a crime event at a spatial/temporal location tends to imply a

higher risk, localised in space and time, for nearby locations.

Classical prediction techniques tend to generate \hot spots"

around previous locations.

Part I: How do we do this? (Plea for reproducible research.)

Part II: And what do we mean by \prediction" anyway? What

makes a \good" prediciton?

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 3 / 20

Page 8: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The (Near-)repeat hypothesis

\The tendency of victims of crime to, in the nearby future, be repeat

victims; and of near-by (say) buildings to also be future victims."

(Principally interested in Burglary.)

That is, a crime event at a spatial/temporal location tends to imply a

higher risk, localised in space and time, for nearby locations.

Classical prediction techniques tend to generate \hot spots"

around previous locations.

Part I: How do we do this? (Plea for reproducible research.)

Part II: And what do we mean by \prediction" anyway? What

makes a \good" prediciton?

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 3 / 20

Page 9: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The (Near-)repeat hypothesis

\The tendency of victims of crime to, in the nearby future, be repeat

victims; and of near-by (say) buildings to also be future victims."

(Principally interested in Burglary.)

That is, a crime event at a spatial/temporal location tends to imply a

higher risk, localised in space and time, for nearby locations.

Classical prediction techniques tend to generate \hot spots"

around previous locations.

Part I: How do we do this? (Plea for reproducible research.)

Part II: And what do we mean by \prediction" anyway? What

makes a \good" prediciton?

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 3 / 20

Page 10: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The (Near-)repeat hypothesis

\The tendency of victims of crime to, in the nearby future, be repeat

victims; and of near-by (say) buildings to also be future victims."

(Principally interested in Burglary.)

That is, a crime event at a spatial/temporal location tends to imply a

higher risk, localised in space and time, for nearby locations.

Classical prediction techniques tend to generate \hot spots"

around previous locations.

Part I: How do we do this? (Plea for reproducible research.)

Part II: And what do we mean by \prediction" anyway? What

makes a \good" prediciton?

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 3 / 20

Page 11: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The (Near-)repeat hypothesis

\The tendency of victims of crime to, in the nearby future, be repeat

victims; and of near-by (say) buildings to also be future victims."

(Principally interested in Burglary.)

That is, a crime event at a spatial/temporal location tends to imply a

higher risk, localised in space and time, for nearby locations.

Classical prediction techniques tend to generate \hot spots"

around previous locations.

Part I: How do we do this? (Plea for reproducible research.)

Part II: And what do we mean by \prediction" anyway? What

makes a \good" prediciton?

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 3 / 20

Page 12: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Publications

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 4 / 20

Page 13: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The algorithm

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 5 / 20

Page 14: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The code

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 6 / 20

Page 15: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Reproducible Research

\An article about computational science in a scienti�c publication is

not the scholarship itself, it is merely advertising of the scholarship.

The actual scholarship is the complete software development

environment and the complete set of instructions which generated the

�gures." | Buckheit, Donoho, \WaveLab and Reproducible Research", 1995.

\In my own experience, error is ubiquitous in scienti�c computing . . . "

| Donoho, \An invitation to reproducible computational research", Biostatistics (2010).

Merton's norms: universalism, communalism, disinterestedness,

organized scepticism.

With thanks to Victoria Stodden.

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 7 / 20

Page 16: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Reproducible Research

\An article about computational science in a scienti�c publication is

not the scholarship itself, it is merely advertising of the scholarship.

The actual scholarship is the complete software development

environment and the complete set of instructions which generated the

�gures." | Buckheit, Donoho, \WaveLab and Reproducible Research", 1995.

\In my own experience, error is ubiquitous in scienti�c computing . . . "

| Donoho, \An invitation to reproducible computational research", Biostatistics (2010).

Merton's norms: universalism, communalism, disinterestedness,

organized scepticism.

With thanks to Victoria Stodden.

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 7 / 20

Page 17: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Reproducible Research

\An article about computational science in a scienti�c publication is

not the scholarship itself, it is merely advertising of the scholarship.

The actual scholarship is the complete software development

environment and the complete set of instructions which generated the

�gures." | Buckheit, Donoho, \WaveLab and Reproducible Research", 1995.

\In my own experience, error is ubiquitous in scienti�c computing . . . "

| Donoho, \An invitation to reproducible computational research", Biostatistics (2010).

Merton's norms: universalism, communalism, disinterestedness,

organized scepticism.

With thanks to Victoria Stodden.

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 7 / 20

Page 18: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Resources

http://reproducibleresearch.net/

https://rroxford.github.io/

http://www.bmj.com/content/344/bmj.e4383

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 8 / 20

Page 19: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

But to continue

Wikipedia entry \Hobby horse"

\My Uncle Toby on his Hobby-horse",Wikipedia

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 9 / 20

Page 20: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

What is crime prediction?

\Precrime: It Works!"

Wikipedia entry \The Minority

Report"From IMDB

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 10 / 20

Page 21: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

What is crime prediction actually?

\Although much news coverage promotes the meme that predictive

policing is a crystal ball, these algorithms predict the risk of future

events, not the events themselves." Perry, McInnis, Price, Smith, Hollywood,

\Predictive Policing", RAND report.

\Prior to each shift, Santa Cruz police o�cers receive information

identifying 15 such squares with the highest probability of crime, and

are encouraged | though not required | to provide greater attention

to these areas." Joh, \Policing by numbers: Big data and the fourth amendment.

\Despite the increased emphasis on proactive policing, the core of

police work remains that of responding to calls for service. . . " Gro�, La

Vigne, \Forecasting the future of predictive crime mapping".

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 11 / 20

Page 22: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

What is crime prediction actually?

\Although much news coverage promotes the meme that predictive

policing is a crystal ball, these algorithms predict the risk of future

events, not the events themselves." Perry, McInnis, Price, Smith, Hollywood,

\Predictive Policing", RAND report.

\Prior to each shift, Santa Cruz police o�cers receive information

identifying 15 such squares with the highest probability of crime, and

are encouraged | though not required | to provide greater attention

to these areas." Joh, \Policing by numbers: Big data and the fourth amendment.

\Despite the increased emphasis on proactive policing, the core of

police work remains that of responding to calls for service. . . " Gro�, La

Vigne, \Forecasting the future of predictive crime mapping".

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 11 / 20

Page 23: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

What is crime prediction actually?

\Although much news coverage promotes the meme that predictive

policing is a crystal ball, these algorithms predict the risk of future

events, not the events themselves." Perry, McInnis, Price, Smith, Hollywood,

\Predictive Policing", RAND report.

\Prior to each shift, Santa Cruz police o�cers receive information

identifying 15 such squares with the highest probability of crime, and

are encouraged | though not required | to provide greater attention

to these areas." Joh, \Policing by numbers: Big data and the fourth amendment.

\Despite the increased emphasis on proactive policing, the core of

police work remains that of responding to calls for service. . . " Gro�, La

Vigne, \Forecasting the future of predictive crime mapping".

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 11 / 20

Page 24: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Analogy with weather forecasting

I have found analogies with probabilistic forecasting within

Meteorology to be very pro�table.

\There is a 20% chance of rain in Leeds tomorrow."

What does this mean?

If we make this prediction many times, then 1 in 5 times, it should

rain tomorrow. \reliability".

But maybe it rains 20% of the time in Leeds anyway (over a year,

say)?

\resolution" (which is hard to actually de�ne.)

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 12 / 20

Page 25: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Analogy with weather forecasting

I have found analogies with probabilistic forecasting within

Meteorology to be very pro�table.

\There is a 20% chance of rain in Leeds tomorrow."

What does this mean?

If we make this prediction many times, then 1 in 5 times, it should

rain tomorrow. \reliability".

But maybe it rains 20% of the time in Leeds anyway (over a year,

say)?

\resolution" (which is hard to actually de�ne.)

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 12 / 20

Page 26: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Analogy with weather forecasting

I have found analogies with probabilistic forecasting within

Meteorology to be very pro�table.

\There is a 20% chance of rain in Leeds tomorrow."

What does this mean?

If we make this prediction many times, then 1 in 5 times, it should

rain tomorrow. \reliability".

But maybe it rains 20% of the time in Leeds anyway (over a year,

say)?

\resolution" (which is hard to actually de�ne.)

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 12 / 20

Page 27: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Analogy with weather forecasting

I have found analogies with probabilistic forecasting within

Meteorology to be very pro�table.

\There is a 20% chance of rain in Leeds tomorrow."

What does this mean?

If we make this prediction many times, then 1 in 5 times, it should

rain tomorrow. \reliability".

But maybe it rains 20% of the time in Leeds anyway (over a year,

say)?

\resolution" (which is hard to actually de�ne.)

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 12 / 20

Page 28: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Analogy with weather forecasting

I have found analogies with probabilistic forecasting within

Meteorology to be very pro�table.

\There is a 20% chance of rain in Leeds tomorrow."

What does this mean?

If we make this prediction many times, then 1 in 5 times, it should

rain tomorrow. \reliability".

But maybe it rains 20% of the time in Leeds anyway (over a year,

say)?

\resolution" (which is hard to actually de�ne.)

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 12 / 20

Page 29: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Analogy with weather forecasting

I have found analogies with probabilistic forecasting within

Meteorology to be very pro�table.

\There is a 20% chance of rain in Leeds tomorrow."

What does this mean?

If we make this prediction many times, then 1 in 5 times, it should

rain tomorrow. \reliability".

But maybe it rains 20% of the time in Leeds anyway (over a year,

say)?

\resolution" (which is hard to actually de�ne.)

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 12 / 20

Page 30: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Lack of analogy

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Naive prediction

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000KDE prediction

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Actual events

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Naive prediction

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000KDE prediction

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Actual events

Northside of Chicago, predictions and reality for 5th Nov 2016, and

23rd October 2016.

The probabilities involved are tiny.

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 13 / 20

Page 31: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Hit rate

The de facto standard.

Pick a \coverage level", say 10% of the

area, which might be chosen given

Policing resources.

Pick that % of grid cells, by picking

those with the highest risk �rst.

Then calculate the fraction of actual

events which fall in the selected grid

cells.

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'naive' prediciton

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'kde' prediciton

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 14 / 20

Page 32: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Hit rate

The de facto standard.

Pick a \coverage level", say 10% of the

area, which might be chosen given

Policing resources.

Pick that % of grid cells, by picking

those with the highest risk �rst.

Then calculate the fraction of actual

events which fall in the selected grid

cells.

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'naive' prediciton

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'kde' prediciton

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 14 / 20

Page 33: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Hit rate

The de facto standard.

Pick a \coverage level", say 10% of the

area, which might be chosen given

Policing resources.

Pick that % of grid cells, by picking

those with the highest risk �rst.

Then calculate the fraction of actual

events which fall in the selected grid

cells.

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'naive' prediciton

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'kde' prediciton

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 14 / 20

Page 34: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Hit rate

The de facto standard.

Pick a \coverage level", say 10% of the

area, which might be chosen given

Policing resources.

Pick that % of grid cells, by picking

those with the highest risk �rst.

Then calculate the fraction of actual

events which fall in the selected grid

cells.

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'naive' prediciton

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'kde' prediciton

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 14 / 20

Page 35: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The good, the bad, the ugly

Easy to understand, tied to usage of the

prediction;

But seems to me to confuse prediction

with hot-spot / patrol plan creation.

Notice the huge quantitative di�erence

in the two examples.

How do you deal with the selection of a

coverage level?

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'naive' prediciton

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'kde' prediciton

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 15 / 20

Page 36: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The good, the bad, the ugly

Easy to understand, tied to usage of the

prediction;

But seems to me to confuse prediction

with hot-spot / patrol plan creation.

Notice the huge quantitative di�erence

in the two examples.

How do you deal with the selection of a

coverage level?

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'naive' prediciton

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'kde' prediciton

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 15 / 20

Page 37: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The good, the bad, the ugly

Easy to understand, tied to usage of the

prediction;

But seems to me to confuse prediction

with hot-spot / patrol plan creation.

Notice the huge quantitative di�erence

in the two examples.

How do you deal with the selection of a

coverage level?

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'naive' prediciton

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'kde' prediciton

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 15 / 20

Page 38: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

The good, the bad, the ugly

Easy to understand, tied to usage of the

prediction;

But seems to me to confuse prediction

with hot-spot / patrol plan creation.

Notice the huge quantitative di�erence

in the two examples.

How do you deal with the selection of a

coverage level?

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'naive' prediciton

350000 352000 354000 356000 358000

583000

584000

585000

586000

587000

588000Top 10% of 'kde' prediciton

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 15 / 20

Page 39: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Interpret the results

Usual to plot mean hitrate against

coverage. Then use some statistical test.

But what's the model?

Let's suppose that each trial is an

independent draw from a binomial with

unknown p.

Use a at prior. Compute the predictive

posterior, plot the median and

inter-quartile range.

Gives much the same result (the number

of events per day doesn't vary that much).

0 20 40 60 80 100Coverage (%)

0

20

40

60

80

100

Hit r

ate

(%)

Mean hit ratenaivekde

0 20 40 60 80 100Coverage (%)

0.0

0.2

0.4

0.6

0.8

1.0

Succ

essf

ul c

aptu

re p

roba

bilit

y

naivekde

0 2 4 6 8 10 12 14Coverage (%)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Succ

essf

ul c

aptu

re p

roba

bilit

ynaivekde

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 16 / 20

Page 40: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Interpret the results

Usual to plot mean hitrate against

coverage. Then use some statistical test.

But what's the model?

Let's suppose that each trial is an

independent draw from a binomial with

unknown p.

Use a at prior. Compute the predictive

posterior, plot the median and

inter-quartile range.

Gives much the same result (the number

of events per day doesn't vary that much).

0 20 40 60 80 100Coverage (%)

0

20

40

60

80

100

Hit r

ate

(%)

Mean hit ratenaivekde

0 20 40 60 80 100Coverage (%)

0.0

0.2

0.4

0.6

0.8

1.0

Succ

essf

ul c

aptu

re p

roba

bilit

y

naivekde

0 2 4 6 8 10 12 14Coverage (%)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Succ

essf

ul c

aptu

re p

roba

bilit

ynaivekde

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 16 / 20

Page 41: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Interpret the results

Usual to plot mean hitrate against

coverage. Then use some statistical test.

But what's the model?

Let's suppose that each trial is an

independent draw from a binomial with

unknown p.

Use a at prior. Compute the predictive

posterior, plot the median and

inter-quartile range.

Gives much the same result (the number

of events per day doesn't vary that much).

0 20 40 60 80 100Coverage (%)

0

20

40

60

80

100

Hit r

ate

(%)

Mean hit ratenaivekde

0 20 40 60 80 100Coverage (%)

0.0

0.2

0.4

0.6

0.8

1.0

Succ

essf

ul c

aptu

re p

roba

bilit

y

naivekde

0 2 4 6 8 10 12 14Coverage (%)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Succ

essf

ul c

aptu

re p

roba

bilit

ynaivekde

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 16 / 20

Page 42: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Interpret the results

Usual to plot mean hitrate against

coverage. Then use some statistical test.

But what's the model?

Let's suppose that each trial is an

independent draw from a binomial with

unknown p.

Use a at prior. Compute the predictive

posterior, plot the median and

inter-quartile range.

Gives much the same result (the number

of events per day doesn't vary that much).

0 20 40 60 80 100Coverage (%)

0

20

40

60

80

100

Hit r

ate

(%)

Mean hit ratenaivekde

0 20 40 60 80 100Coverage (%)

0.0

0.2

0.4

0.6

0.8

1.0

Succ

essf

ul c

aptu

re p

roba

bilit

y

naivekde

0 2 4 6 8 10 12 14Coverage (%)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Succ

essf

ul c

aptu

re p

roba

bilit

ynaivekde

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 16 / 20

Page 43: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Interpret the results

Usual to plot mean hitrate against

coverage. Then use some statistical test.

But what's the model?

Let's suppose that each trial is an

independent draw from a binomial with

unknown p.

Use a at prior. Compute the predictive

posterior, plot the median and

inter-quartile range.

Gives much the same result (the number

of events per day doesn't vary that much).

0 20 40 60 80 100Coverage (%)

0

20

40

60

80

100

Hit r

ate

(%)

Mean hit ratenaivekde

0 20 40 60 80 100Coverage (%)

0.0

0.2

0.4

0.6

0.8

1.0

Succ

essf

ul c

aptu

re p

roba

bilit

y

naivekde

0 2 4 6 8 10 12 14Coverage (%)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Succ

essf

ul c

aptu

re p

roba

bilit

ynaivekde

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 16 / 20

Page 44: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Brier scores

BS = 1

N

∑N

t=1(ft − ot )2

F = 1

K

∑K

k=1

(pk −

nk

N

)2

Return to Meteorology and probabilistic forecasting.

Binary events: either happens (1) or not (0).

For t = 1, · · · ,N make a prediction ft ∈ [0, 1].

Have actual events (ot ).

We follow a variant from Roberts, \Assessing the spatial andtemporal variation in the skill of precipitation forecasts from anNWP model"

I K grid cellsI predicted probability pkI nk actual events so nk/N fraction.

\Fractional Brier Score"

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 17 / 20

Page 45: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Brier scores

BS = 1

N

∑N

t=1(ft − ot )2

F = 1

K

∑K

k=1

(pk −

nk

N

)2

Return to Meteorology and probabilistic forecasting.

Binary events: either happens (1) or not (0).

For t = 1, · · · ,N make a prediction ft ∈ [0, 1].

Have actual events (ot ).

We follow a variant from Roberts, \Assessing the spatial andtemporal variation in the skill of precipitation forecasts from anNWP model"

I K grid cellsI predicted probability pkI nk actual events so nk/N fraction.

\Fractional Brier Score"

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 17 / 20

Page 46: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Brier scores

BS = 1

N

∑N

t=1(ft − ot )2

F = 1

K

∑K

k=1

(pk −

nk

N

)2

Return to Meteorology and probabilistic forecasting.

Binary events: either happens (1) or not (0).

For t = 1, · · · ,N make a prediction ft ∈ [0, 1].

Have actual events (ot ).

We follow a variant from Roberts, \Assessing the spatial andtemporal variation in the skill of precipitation forecasts from anNWP model"

I K grid cellsI predicted probability pkI nk actual events so nk/N fraction.

\Fractional Brier Score"

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 17 / 20

Page 47: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Brier scores

BS = 1

N

∑N

t=1(ft − ot )2

F = 1

K

∑K

k=1

(pk −

nk

N

)2

Return to Meteorology and probabilistic forecasting.

Binary events: either happens (1) or not (0).

For t = 1, · · · ,N make a prediction ft ∈ [0, 1].

Have actual events (ot ).

We follow a variant from Roberts, \Assessing the spatial andtemporal variation in the skill of precipitation forecasts from anNWP model"

I K grid cellsI predicted probability pkI nk actual events so nk/N fraction.

\Fractional Brier Score"

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 17 / 20

Page 48: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Brier scores

BS = 1

N

∑N

t=1(ft − ot )2

F = 1

K

∑K

k=1

(pk −

nk

N

)2

Return to Meteorology and probabilistic forecasting.

Binary events: either happens (1) or not (0).

For t = 1, · · · ,N make a prediction ft ∈ [0, 1].

Have actual events (ot ).

We follow a variant from Roberts, \Assessing the spatial andtemporal variation in the skill of precipitation forecasts from anNWP model"

I K grid cellsI predicted probability pkI nk actual events so nk/N fraction.

\Fractional Brier Score"

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 17 / 20

Page 49: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Skill score; resultsFworst =

1

K

∑K

k=1

(p2k+(nk

N

)2)FS = 1− F/Fworst

What are units of F?

FS is the \skill"; closer to 1 is better.

0 5 10 15 20 25Naive prediction

0.15

0.10

0.05

0.00

0.05

0.10

0.15

KDE

- Nai

ve

Brier score

0.20 0.15 0.10 0.05 0.00 0.05 0.10

0.0

0.2

0.4

0.6

0.8

1.0Brier score; CDF of difference

0.000 0.005 0.010 0.015 0.020 0.025 0.030Naive prediction

0.000

0.005

0.010

0.015

0.020

0.025

0.030

KDE

pred

ictor

Brier skill

0.015 0.010 0.005 0.000 0.005 0.010 0.015

0.0

0.2

0.4

0.6

0.8

1.0Brier skill; CDF of difference

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 18 / 20

Page 50: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Bayesian information gain

Want to capture the feeling that if we see more events on a given

day, we should learn more about the quality of the prediction.

My idea is to use the prediction to form a prior, the update this

given the data to form a posterior, and then compare these with

the Kullback-Leibler divergence.

Measures the information gain from prior to posterior{ a good

prediction should mean less gained on learning the result.

0 25 50 75 100 125 150naive

0

20

40

60

80

100

120

140

kde

Dirichlet distribution

20 0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

CDF of differences

0 2 4 6 8 10naive

0

2

4

6

8

10

kde

Predictive distribution

2 0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

CDF of differences

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 19 / 20

Page 51: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Conclusions?

Seems a little inconclusive.

Hit rate, Brier scores, (other ideas we develop) show roughly a tie.

The information gain idea is more of a clear win for the KDE

method.

Original aim was to get beyond the \hit rate" as being the only game

in town.

Bit of a work in progress: any ideas much appreciated!

https://github.com/QuantCrimAtLeeds/PredictCode/

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 20 / 20

Page 52: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Conclusions?

Seems a little inconclusive.

Hit rate, Brier scores, (other ideas we develop) show roughly a tie.

The information gain idea is more of a clear win for the KDE

method.

Original aim was to get beyond the \hit rate" as being the only game

in town.

Bit of a work in progress: any ideas much appreciated!

https://github.com/QuantCrimAtLeeds/PredictCode/

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 20 / 20

Page 53: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Conclusions?

Seems a little inconclusive.

Hit rate, Brier scores, (other ideas we develop) show roughly a tie.

The information gain idea is more of a clear win for the KDE

method.

Original aim was to get beyond the \hit rate" as being the only game

in town.

Bit of a work in progress: any ideas much appreciated!

https://github.com/QuantCrimAtLeeds/PredictCode/

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 20 / 20

Page 54: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Conclusions?

Seems a little inconclusive.

Hit rate, Brier scores, (other ideas we develop) show roughly a tie.

The information gain idea is more of a clear win for the KDE

method.

Original aim was to get beyond the \hit rate" as being the only game

in town.

Bit of a work in progress: any ideas much appreciated!

https://github.com/QuantCrimAtLeeds/PredictCode/

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 20 / 20

Page 55: Reproducible research: assessing spatial predictions of crime · Reproducible research: assessing spatial predictions of crime Matthew Daws Leeds LIDA Seminar, November 2017 Matthew

Conclusions?

Seems a little inconclusive.

Hit rate, Brier scores, (other ideas we develop) show roughly a tie.

The information gain idea is more of a clear win for the KDE

method.

Original aim was to get beyond the \hit rate" as being the only game

in town.

Bit of a work in progress: any ideas much appreciated!

https://github.com/QuantCrimAtLeeds/PredictCode/

Matthew Daws (Leeds) Assessing predictions LIDA, Nov 2017 20 / 20


Recommended