+ All Categories
Home > Education > Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Date post: 27-Jul-2015
Category:
Upload: denis-parra-santander
View: 52 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations Alfredo Cobo [email protected] Denis Parra [email protected] Jaime Navón [email protected] Pon=ficia Universidad Católica de Chile Departamento de Ciencia de la Computación Av. Vicuña Mackenna 4860, Macul San=ago, Chile
Transcript
Page 1: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Alfredo  Cobo  [email protected]  

Denis  Parra  [email protected]  

Jaime  Navón  [email protected]  

Pon=ficia  Universidad  Católica  de  Chile  Departamento  de  Ciencia  de  la  Computación  

Av.  Vicuña  Mackenna  4860,  Macul  San=ago,  Chile  

 

Page 2: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

I (… and some other people in this room)

…  come  from  Chile  

Picture  from  hMp://www.quadrodemedalhas.com/images/mapas/mapa-­‐chile.jpg  

hMp://upload.wikimedia.org/wikipedia/commons/thumb/9/91/Chile_in_South_America_(-­‐mini_map_-­‐rivers).svg/409px-­‐Chile_in_South_America_(-­‐mini_map_-­‐rivers).svg.png  

Page 3: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Chile, well-known for its..

•   Copper  (Top  Producer)  

"Top  5  Copper  Producers"  by  Plazak  -­‐  Own  work.  Licensed  under  CC  BY-­‐SA  3.0  via  Wikimedia  Commons  -­‐  hMp://commons.wikimedia.org/wiki/File:Top_5_Copper_Producers.png#/media/File:Top_5_Copper_Producers.png  hMps://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0CAYQjB0&url=hMp%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FFile%3ANa=ve_Copper_(mineral).jpg&ei=L31ZVbOsL4r1UrbRgKAB&bvm=bv.93564037,d.d24&psig=AFQjCNHr2zm5m4Jmim7AgkCwwSb0b5mGUA&ust=1432014509629311  

Page 4: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Chile, well-known for its..

• Wine    (Price  +  quality)    

"Fiesta  de  Vendimia"  by  LuxoDresden  -­‐  Own  work.  Licensed  under  CC  BY-­‐SA  3.0  via  Wikimedia  Commons  -­‐  hMp://commons.wikimedia.org/wiki/File:Fiesta_de_Vendimia.JPG#/media/File:Fiesta_de_Vendimia.JPG  

Page 5: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

If you start typing in Google…

9  out  of  10  disasters  …  

Page 6: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

If you start typing in Google…

9  out  of  10  disasters  …  prefer  Chile  

Page 7: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

… and for Natural Disasters L

• Largest  ever  registered  earthquake  in  History:  Valdivia,  Chile,  22nd  of  May  of  1960  (9.5  in  Richter  Scale)  

• We  usually  have  1  large  earthquake  every  30  years  (~  8  degrees    in  Richter  Scale)  

• Last  one  in  2010  close  to  Concepción,  but  it  also  affected  San=ago  (the  capital)  

Page 8: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

… so, at PUC Chile

• We  created  CIGIDEN  “Na=onal  Research  Center  for  the  Integrated  Administra=on  of  Natural  Disasters”  

Page 9: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

CIGIDEN’s Goal in this project

• Help  ci=zens  staying  informed  during  situa=ons  of  natural  disasters  by  using  Social  Media.  • Build  Mobile  Applica=on  (Carlos  Molina)  • Filter  automa=cally  relevant  messages  from  those  not  related  to  earthquakes  (Alfredo  Cobo)  to  feed  the  applica=on  

 

Page 10: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Our Task: Building a Twitter classifier -­‐ Filter  tweets  related  to  natural  disasters  from  those  who  did  not.    

Page 11: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Related Work Manual  Classifica8on   Data  Post-­‐processing   Feature  Genera8on   Tools  for  Disaster  Management  

Vieweg  et  al.  (2010)  Imran  et  al.  (2013)  Mendoza  et  al.  (2010)      

Mendoza  et  al.  (2010)  Cas=llo  et  al.  (2011)    (Informa=on  Credibility  on  TwiMer)  

Gimpel  et  al.  (2011)  Koloumpis  et  al.  (2011)  Liu  et  al.  (2012)  Wu  et  al.  (2011)  Lee  et  al.  (2014)    (Not  necessarily  for  natural  disasters)    

Hiltz  et  al.  (2013)  Power  et  al.  (2013)  Caragea  et  al.  (2011)  Abel  et  al.  (2012)  Middleton  et  al.  (2014)  MorstaMer  et  al.  (2013)  Imran  et  al.  (2014)  

Page 12: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Why building this classifier would be a contribution? • Building  and  valida=ng  a  ground  truth  for  classifying  tweets  in  Spanish.  

• Building  the  classifier  and  dealing  with  • Class  Imbalance    • Number  of  latent  dimensions  (Feature  Genera=on  using  LDA)  

Page 13: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Workflow of Activities

Chile’s  Earthquake  2010  

Cas=llo  et  al.  (2010)  

Our  groundtruth  

Non-­‐relevant  messages  

Realis=c  dataset  

Sampling,  Cleaning  &    filtering  

Classifiers  

-­‐  Feature  selec=on  (LDA)  

-­‐  Class  Imbalance  

10%  -­‐  80%  

Page 14: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Building the ground truth

• Random  sampling  of  5,000  tweets  from  Cas=llo  et  al.  (2010)  dataset,  used  to  study  credibility  ~  Chile’s  2010  earthquake.  

• Dates:  From  February  27th  un=l  March  2nd  (Spanning  4  days  in  2010)  

• We  kept  only  Spanish  messages,  removed  messages  too  similar  (Lavenshtein  distance):  2,187  messages  leE  

Page 15: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Validating of the ground truth

•  Fleiss  Kappa:  •  κ  =  0.645,  p  <  .001  

•  Intraclass  correla=on  •  ICC(2,1):  IIC  =  0.646,  p  <  .001  

•  Landis  and  Koch  et  al.  (1977)  

 

•   Relevant  messages  were  labeled  based  on  Imran  et  al.  (2013)  classifica=on:  • Cau=on/Warning  • Casual=es  and  Damage  • People  (missing,  found,  etc.)  • Informa=on  source  

Page 16: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Workflow of Activities

Chile’s  Earthquake  2010  

Cas=llo  et  al.  (2010)  

Our  groundtruth  

Non-­‐relevant  messages  

Realis=c  dataset  

Sampling,  Cleaning  &    filtering  

Classifiers  

-­‐  Feature  selec=on  (LDA)  

-­‐  Class  Imbalance  

Page 17: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Classification Problem Features                                                                                      Class  Imbalance  

User  Network  

Content  (4,766  unique  words)  

Followers   Hashtags  Followees   Words  

User  men=ons  

•  Ground  Truth  is  a  not  realis=c  representa=on  of  TwiMer  

•  We  added  “Noise”:  Introduced  Tweets  non-­‐relevant  to  the  event  (20%  -­‐  80%)  

•  Sampled  non-­‐relevant  tweets  from  5  months.  

•  Removed  all  tweets  posted  during  days  of  seismic  ac=vi=es  

Page 18: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Model   Precision   Recall   F1  score   Accuracy   AUC   Dimensions   Noise  Propor8on  

Baseline   0.625   0.545   0.53   0.5   0.568   -­‐   0  

Bernoulli  NB  

0.831   0.226   0.355   0.594   0.605   2000   0  

Logis=c  Regression  

0.827   0.641   0.722   0.756   0.834   2000   0.6  

Linear  SVM   0.687   0.677   0.682   0.687   0.719   1000   0.6  

Random  Forest  

0.807   0.673   0.734   0.758   0.844   1000   0.8  

Classification Results

Page 19: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Analysis ~ LDA Dimensions and Noise

Page 20: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Analysis ~ LDA Dimensions and Noise

Page 21: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Conclusions & Future Work

• We  built  and  validated  a  ground  truth  of  tweets  in  Spanish  relevant  to  disasters  

• We  implemented  a  classifier  and  analyzed  its  performance  based  on  several  algorithms  and  dealing  with  class  imbalance  problem  

• Future  Work:  Move  the  applica=on  from  prototype  to  produc=on,  test  online  scalability  

Page 22: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

That’s all folks!

•   Thanks  and  ques=ons  to  corresponding  author  Alfredo  Cobo:  [email protected]  or  Denis  Parra:  [email protected]    

Page 23: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Chile, small country, but well-known for its..

• Length  (4,300  Km)    

~  4,300  Km   ~8,000  Km  

Page 24: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Model Features

• Newman  et  al.  (2007)  • Biro  et  al.  (2008)  • Wei  et  al.  (2006)  • Wang  et  al.  (2012)  • Han  (2005)  

Features   Corpora  Features  Followers   Hashtags  Friends   Words  

User  men=ons  

Page 25: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Results

• Amatriain  et  al.  (2013)  

Page 26: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Architecture

Page 27: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Plots of bootstrap Agreement  Day  1   Agreement  Day  2  

Agreement  Day  4  Agreement  Day  3  

Page 28: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Word Frequencies

Page 29: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Just “Terremoto”: AUC

Page 30: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Related Work

Page 31: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Manual classification

• Vieweg  et  al.  (2010)  •  Imran  et  al.  (2013)  

Page 32: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Post Processing

• Cas=llo  et  al.  (2011)  • Mendoza  et  al.  (2010)  

Page 33: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Feature Generation Approaches

• Gimpel  et  al.  (2011)  • Koloumpis  et  al.  (2011)  •  Liu  et  al.  (2012)  • Wu  et  al.  (2011)  •  Lee  et  al.  (2014)  

Page 34: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Tools For Disaster Management

• Hiltz  et  al.  (2013)  • Power  et  al.  (2013)  • Caragea  et  al.  (2011)  • Abel  et  al.  (2012)  • Middleton  et  al.  (2014)  • MorstaMer  et  al.  (2013)  •  Imran  et  al.  (2014)  

Page 35: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Building the ground truth

• Mendoza  et  al.  (2010)  

•  Imran  et  al.  (2013)  

Page 36: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations

Algorithms and evaluation procedure

• Cas=llo  et  al.  (2011)  •  FawceM  et  al.  (2004)  • Manning  et  al.  (2008)  • Wen  et  al.  (2014)  


Recommended