+ All Categories
Home > Documents > Automatic Extraction of Soccer Game Event Data from Twitter

Automatic Extraction of Soccer Game Event Data from Twitter

Date post: 05-Dec-2014
Category:
Upload: marieke-van-erp
View: 914 times
Download: 4 times
Share this document with a friend
Description:
Presentation at DeRiVE 2012. Paper at: http://ceur-ws.org/Vol-902/paper_3.pdf Slides by Guido van Oorschot
32
Automa’c extrac’on of soccer game event data from Twi6er Guido van Oorschot, Marieke van Erp and Chris Dijkshoorn Monday, November 12, 12
Transcript
Page 1: Automatic Extraction of Soccer Game Event Data from Twitter

Automa'c  extrac'on  of  soccer  game  event  data  

from  Twi6er  

Guido  van  Oorschot,  Marieke  van  Erp  and  Chris  Dijkshoorn

Monday, November 12, 12

Page 2: Automatic Extraction of Soccer Game Event Data from Twitter

Soccer  data

Monday, November 12, 12

Page 3: Automatic Extraction of Soccer Game Event Data from Twitter

Theory

1. Fair body of research on automated sports highlight extraction

2. Twitter data can offer interesting insights in real world phenomena

Monday, November 12, 12

Page 4: Automatic Extraction of Soccer Game Event Data from Twitter

Automated  highlight  detec@on

Let’s Use Twitter data!

Monday, November 12, 12

Page 5: Automatic Extraction of Soccer Game Event Data from Twitter

1.Detecting events What minutes did events occur?

2.Classifying events Is the event a goal, card or substitution?

3.Assigning events to teams Is the event for the home team or away team?

3  Tasks

Monday, November 12, 12

Page 6: Automatic Extraction of Soccer Game Event Data from Twitter

5  types  of  events

- Goal

- Own Goal

- Red Card

- Yellow Card

- Substitution

Monday, November 12, 12

Page 7: Automatic Extraction of Soccer Game Event Data from Twitter

Methodology

1. Gathering the data

2. Exploring and cleaning the data

3. Classifying interesting data points

Monday, November 12, 12

Page 8: Automatic Extraction of Soccer Game Event Data from Twitter

Gathering  data

- Collect all tweets with game hashtags

#ajafey #nacgro #psvutr

- Collect official data for each match

Goals, cards, substitutions

Monday, November 12, 12

Page 9: Automatic Extraction of Soccer Game Event Data from Twitter

Our  data

6 months61 games

661 events10,643 tweets

Monday, November 12, 12

Page 10: Automatic Extraction of Soccer Game Event Data from Twitter

1. Detecting events

2. Classifying events

3. Assigning events to teams

Three  Experiments

Monday, November 12, 12

Page 11: Automatic Extraction of Soccer Game Event Data from Twitter

1. Detecting events

Monday, November 12, 12

Page 12: Automatic Extraction of Soccer Game Event Data from Twitter

1. Detecting events

Monday, November 12, 12

Page 13: Automatic Extraction of Soccer Game Event Data from Twitter

1. Experimental Setup

- Goal: detect peaks in # tweets per minute signal to extract events

- Setup: Test three peak detection methods:

1. LocMaxNoBaseLineCorr2. IntThresNoBaseLineCorr3. IntThresWithBaseLineCorr

Monday, November 12, 12

Page 14: Automatic Extraction of Soccer Game Event Data from Twitter

1. Results

Monday, November 12, 12

Page 15: Automatic Extraction of Soccer Game Event Data from Twitter

1. Findings

- Goals and red cards are detected better than yellow cards and substitutions

- None of the three peak selection methods works well.

- Highlights can be extracted, but not precise enough

Monday, November 12, 12

Page 16: Automatic Extraction of Soccer Game Event Data from Twitter

1. Detecting events

2. Classifying events

3. Assigning events to teams

Three  Experiments

Monday, November 12, 12

Page 17: Automatic Extraction of Soccer Game Event Data from Twitter

2. Classifying Events

minute “goal” “1” “red” “card” “boring” class

34 0 2 0 1 20 nothing

35 23 34 0 0 0 goal

12 1 2 0 0 5 nothing

13 1 0 22 11 0 red  card

- Goal: Classify minutes into event classes

Monday, November 12, 12

Page 18: Automatic Extraction of Soccer Game Event Data from Twitter

Issues

Problem: Huge, sparse matrix

1. Reduce features Choose words/features smartly

2. Reduce instances Choose minutes smartly

Monday, November 12, 12

Page 19: Automatic Extraction of Soccer Game Event Data from Twitter

2. Experimental Setup

- 3 Instance selection settings

1. AllMinutes2. PeakMinutes3. Eventminutes

Monday, November 12, 12

Page 20: Automatic Extraction of Soccer Game Event Data from Twitter

2. Experimental Setup

- 7 Feature selection settings1. AllMoreThanOnce2. Top500TotalFreq3. Top10MinuteFreq4. Top500TotalTfIdf5. Top10MinuteTfIdf6. Top50Infogain7. Top50GainRatio

Monday, November 12, 12

Page 21: Automatic Extraction of Soccer Game Event Data from Twitter

2. Experimental Setup

- 6 types of classifiers1. C4.52. RandomForest3. NaiveBayes4. NaiveBayesMultinomial5. libSVM6. IB1

Monday, November 12, 12

Page 22: Automatic Extraction of Soccer Game Event Data from Twitter

2. Results

Monday, November 12, 12

Page 23: Automatic Extraction of Soccer Game Event Data from Twitter

2. Discussion

- Top50GainRatio best feature selection- libSVM best classifier- EventMinutes results:

         

Class F-­‐measure

OVERALL 0.822Goal 0.841

Own  goal 0.000

Red  card 0.848

Yellow  card 0.785

Subs@tu@on 0.839

Monday, November 12, 12

Page 24: Automatic Extraction of Soccer Game Event Data from Twitter

1. Detecting events

2. Classifying events

3. Assigning events to teams

Three  Experiments

Monday, November 12, 12

Page 25: Automatic Extraction of Soccer Game Event Data from Twitter

3. Experimental Setup

- Goal: Assign events to team

- Based on the ratio between tweets from fans for home and away team

- But first: extract fans

Monday, November 12, 12

Page 26: Automatic Extraction of Soccer Game Event Data from Twitter

3. Extracting fans

- Hypothesis:

People that tweet for the same team each week are probably fan of that team

Monday, November 12, 12

Page 27: Automatic Extraction of Soccer Game Event Data from Twitter

3. Extracting fans

- Extracted 38,527 fans from 146,326 users (26%)

- This method of extracting fans works well:

Right  team Not  clear Wrong  team

88% 10% 2%

Monday, November 12, 12

Page 28: Automatic Extraction of Soccer Game Event Data from Twitter

3. Results

Monday, November 12, 12

Page 29: Automatic Extraction of Soccer Game Event Data from Twitter

3. Results

- Performance of assigning events to teams above baseline performance:

Class Baseline Performance

OVERALL 52% 58%Goal 58% 69%

Red  card 50% 62%

Yellow  card 63% 63%

Subs@tu@on 52% 57%

Monday, November 12, 12

Page 30: Automatic Extraction of Soccer Game Event Data from Twitter

1. Detecting events => difficult

2. Classifying events => good results

3. Assigning events to teams=> promising results

Conclusion

Monday, November 12, 12

Page 31: Automatic Extraction of Soccer Game Event Data from Twitter

Future Work

- Use sentiment in tweets (for detecting events and assigning events to teams)

- Player detection

- Other sports

Monday, November 12, 12

Page 32: Automatic Extraction of Soccer Game Event Data from Twitter

Ques@ons?Monday, November 12, 12


Recommended