Predicting Norovirus with Twitter

Post on 16-Feb-2017

109 views 0 download

transcript

Social Media Analytics Review and

Innovation Group

30/09/2015 Callum Staff

© 2015 Food Standards Agency

Agenda

1. Welcome, Introductions and Purpose

2. Governance Arrangements

3. Predicting Norovirus from Twitter

4. Social Media Research Project Guidance

5. Government Social Research Social Media Research Ethics Guidance

6. Relationships between Consumers and Food Business Operators 

7. Value of Social Media Data in Policy

8. Future Meetings: Frequency and Content

 

Predicting Norovirus Rises with Twitter

30/09/2015 Callum Staff

© 2015 Food Standards Agency

Contents

• Background

• Method

• Model Applications

• Take Home Points

• Using Social Media Data

• Next Steps

BACKGROUND

© 2015 Food Standards Agency

BACKGROUND: Social Media As An Analytical Tool

• Does this new data source add value to our current knowledge?

• Public Health England – syndromic surveillance

• FSA Social Media Team – human observed monitoring

• Added value = knowing early there is a rise in cases

• Earlier we know = earlier we can intervene

METHODS

© 2015 Food Standards Agency

METHOD: Crowd-Sourcing Keywords

© 2015 Food Standards Agency

METHOD: Crowd-Sourcing Keywords

© 2015 Food Standards Agency

METHOD: CorrelatingJa

n-11

Mar

-11

May

-11

Jul-1

1

Sep-

11

Nov-

11

Jan-

12

Mar

-12

May

-12

Jul-1

2

Sep-

12

Nov-

12

Jan-

13

Mar

-13

May

-13

0

200

400

600

0

400

800

1200

Lab Reports Sickness Tweets

Lab

Repo

rts

Twee

ts

© 2015 Food Standards Agency

METHOD: Correlating – Raw Values or Changes?

• Correlations between raw values – not indicative of whether a rise is going to

occur

• Raw values stronger correlations than changes week to week

• Changes are calculated between fortnights not weeks because week to

week changes are too small

Correlations for #sicknessbug Raw Values 1 Week Changes 2 Week Changes

0.50 0.29 0.43

© 2015 Food Standards Agency

METHOD: Lagging the Data

Tweets

Lab Reports

© 2015 Food Standards Agency

METHOD: Lagging the Data

Tweets

Lab Reports

© 2015 Food Standards Agency

METHOD: What’s a Significant Change?

• Practically – any rise which is outside the normal noise

• On the model – any change in the top quartile

• Arbitrary

• Could do machine learning to look at what significant change classification

lead to the model being most accurate

© 2015 Food Standards Agency

METHOD: What’s a Significant Change?

Jan-12Fe

b-12

Mar-12

May-12

Jun-12Jul-1

2Se

p-12Oct-

12

Dec-12Jan

-13Fe

b-13Apr-1

3

May-13

Jul-13

Aug-13Se

p-13

Nov-13

Dec-13Jan

-14

Mar-14Apr-1

4Jun-14

Jul-14

Aug-14Oct-

140

100

200

300

400

500

600Lab Reports Actual Sig. Change

Lab

Repo

rts

© 2015 Food Standards Agency

METHOD: Logistic Regression Model

Given changes in Tweet volumes between weeks 1 and 3, is the change

in lab reports between weeks 4 and 6 significant?

• Significant Change = 1, Non-Significant Change = 0

• Uses exponential formula with Tweet volumes as parameters to give

probability

• Probability can be assigned to either of the binary categories based on a

predefined threshold (typically 0.5)

© 2015 Food Standards Agency

METHOD: Adjusting for Project Requirements

• Receiver Operating Characteristic Curve

• Adjusting the threshold = Adjusting # of true/false positives and true/false

negatives

• Want to increase the number of true positives in order to achieve early

detection

• Willing to sacrifice the model picking up false positives in other places

• Early warning system, not a call to arms

© 2015 Food Standards Agency

METHOD: Adjusting for Project Requirements

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

1-Spec

Sens

itivi

ty

Specificity:

Sensitivity:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

1-Spec

Sens

itivi

ty

© 2015 Food Standards Agency

METHOD: Adjusting for Project Requirements

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

1-Spec

Sens

itivi

ty

Specificity:

Sensitivity:

MODEL APPLICATIONS

© 2015 Food Standards Agency

MODEL APPLICATIONS: Training and Testing

• Difficulty in that only had 2 and a half years of data

• Periods within this dataset where there were Twitter drop

outs/discontinuous lab reports

• Test set is current real time Tweeting

• Will review at the end of the Norovirus season (April)

© 2015 Food Standards Agency

MODEL APPLICATIONS: Final Predictive Model

© 2015 Food Standards Agency

MODEL APPLICATIONS: The Intervention

• Higher risk of project means low resource intensity required

• Needs to be easily deployable – match volatile nature of social media

• Using delivery partners:– NHS Choices – Elderly in hospitals/Care homes– Department for Education – Schools– FSA Comms Team – Food handlers

• Social/online media and contact with advocates in above sectors

USING SOCIAL MEDIA DATA

© 2015 Food Standards Agency

USING SOCIAL MEDIA DATA: Representativeness

• Tweeting Population versus Affected Population

vs

TAKE HOME POINTS &

NEXT STEPS

© 2015 Food Standards Agency

TAKE HOME POINTS: Analytical/Comms Trade Off

• Variable correlations versus giving comms time to act

• Model accuracy versus early warning

• Choice of datasets

© 2015 Food Standards Agency

NEXT STEPS: Geotagging

© 2015 Food Standards Agency

QUESTIONS?callum.staff@foodstandards.gsi.gov.uk