Bursty Event Detection from Text Streams for Disaster Management
2012-04-17
Sungjun Lee, Sangjin Lee, Kwanho Kim, and Jonghun [email protected]
Information Management Lab.Dept. of Industrial Engineering
Seoul National University
2
Introduction
Identify disaster related bursty events from multiple text streams. Characterize bursty terms in terms of
- Skewness, consistency, periodicity, and variation.
normal disaster
Stream 1 Stream 2 Stream Kβ¦
Real worldstates
Streams
Observations happy goodnice
fine baddiecatastrophe
nightmare
Normal terms Disaster related terms
Scoring a term to determine whether or not it is bursty term.
3
Motivation example
The distribution of the frequency of terms observed in AP news stream on Feb. 27, 2010 and Mar. 1, 2010.
On Mar. 1, 2010, the trial about a Bosnian politician, Radovan Karadzic, began.
0
1
2x 10
4weighted TF. day: "27-Feb-2010"
chile
tsunam
i
earth
quak
saturda
i
struck
wave
magnitu
d eta
sund
aiqu
akpa
cifha
iti
berlu
scon
iwood
francmete
rsto
rmslain
massiv
zeala
ndtrig
ger
coas
tsilv
iotria
l
sepa
ratis
t
jerusa
lemsitepo
is
kaza
khsta
n
thaks
inoff
ici
basq
uar
restkil
lerjap
anita
lian
bomber
fridai
santi
ago
prog
ram
nucle
ar300ho
liira
noc
ean
hawaii al
grou
pvis
itorrelie
f
0
20
40TF. day: "27-Feb-2010"
chile
earth
quak
tsunam
ikil
l
sund
aiap
saturda
ifra
ncqu
akpe
oplne
wlea
deriranwave hit
offici
gove
rn
struckye
arwarn
arres
tetapa
ciflast
japangr
oupco
astda
ipo
liclea
st
coun
tri
nucle
arweek
magnitu
dpr
esid al
show
massivfre
nch
russ
iafirstele
ctworld sa
i
damag
rulesto
rmna
tionarmfor
c
0
5000
10000weighted TF. day: "01-Mar-2010"
bosn
ianchile
mondai
olymp
war
medve
devoff
ici
presid
earth
quakar
tist
falkla
nd
span
ishsto
rm
colom
bian
hama
mexico
karad
z
journ
alistdu
bai
court
serbi
alab
ourfine
vene
zuelaplo
tde
athserbba
squ sa
i 27cri
mewinter
europloo
tfes
tivke
nya
russia ira
n
coun
tri
arres
ttur
kei
mediat
medal
dispu
t
rado
vanca
rtel
bosn
ia
assa
ssinwrite
rele
ct
0
20
40TF. day: "01-Mar-2010"
mondai
pres
id
coun
tri
gove
rn apsta
te killne
wch
ilewar
bosn
ian saiira
noff
ici
natio
nlea
derworld
arres
twill
russ
iaye
arpe
opl
china
tuesd
aico
urt
autho
r
earth
quakeu
ropleast
susp
ectca
ll
amer
icanrepo
rt
olymplon
donpo
lic
muslimbr
itain
medve
dev
attac
k
karad
z
vene
zuelalas
tvis
it hit pari
hama
clinton
publi
cqu
ak
On Feb. 27, 2010, earthquake hit Chile.
4
Skewness feature
A bursty term appears intensively in a specific time period during the corresponding event occurs.
Term frequency during L days
Prob
abili
ty
Term frequency during L days
Prob
abili
ty
The change of the term frequency distribution of βtsunamiβ
π πππ€ (π‘ )=πΈ ( π (π‘ )βπ (π (π‘)) )3
π (π (π‘))3where .
5
Consistency feature
The frequency of a bursty term soars across multiple streams.
Stream 1
Stream 2
Stream K
β¦
The change of the term appearance of βtsunamiβ
Twitterer focusing on
tsunami research
Twitterer focusing on travel
Article not containing βtsunamiβ
Articlecontaining βtsunamiβ
ππππ (π‘ )=βπβπΆ ββπ=1
πΏ
(π‘π π ,π‘ , πβ βπ β²βπΆ
π‘π π β² ,π‘ , π /|πΆ|)2
6
Periodicity feature
Periodic terms are less likely to be bursty terms. Penalize terms exhibiting the periodicity.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.01
0.02
0.03
0.04
0.05
0.06
0.07Term "sundai"
Pow
er D
ensi
ty
normalized frequency0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05Term "earthquak"
Pow
er D
ensi
ty
normalized frequency
period=6.8966
period=3.4843
Periodicity of βSundayβ Periodicity of βearthquakeβ
ππππ (π‘ )={π ππ π‘ππππ‘ ππ ππππππππ1 π .π€ .
7
Variation feature
To cope with different writing styles among streams. Reduce the possibility of identifying a term with high frequency
only in a specific stream as a bursty term.
π£πππ (π , π‘ )=πΌ+π π (π , π‘ )ππ(π , π‘ )
Stream 1
Stream 2
Stream K
β¦
The change of the term appearance of βAPβ
AP news
Article not containing βAPβ
Articlecontaining βAPβ
Start to publish articles
with a fixed signature βAP newsβ
8
Putting them all together to measure burstyness
Combine the four scores of different features based on different rationale and scales.
The final term weighting scheme, burst, as follows:
ππ’ππ π‘ (π‘ ,π )=π πππ€ (π‘)π1Γππππ (π‘)π2Γππππ (π‘)π3ΓβπβπΆ
{π£πππ (π , π‘ )Γππ (π , π‘ ,π )}where
9
Experiment setting
6 news channels are collected- Sources: CNN, AP, Reuters, Times Online,
Wall Street Journal, New York Times- Category: World news- Period: 1 Oct. 2009 β 15 Mar. 2010- Source Type: RSS feed
GoogleReaderRepositor
y
Data channels
Google reader API Experimen
tDB
10
Example of bursty terms
1 A strong aftershock to Chile's deadly earthquake provoked a brief panic in the city of Concepcion, but no tsunami warning was issued and no injuries or damage have been reported....
2Tsunami waves of up to 1.5 meters (5 feet) hit far-flung Pacific regions from the Russian far east and Japan to New Zealand's Chatham Islands on Sunday after a powerful earthquake struck Chile, but there were no reports of in-juries or serious damage.
3 Former member of the Bosnian wartime presidency Ejup Ganic was arrested at London's Heathrow airport on Monday on behalf of Serbian authorities, British police said.
4 A tsunami generated by a 8.8 magnitude earthquake in Chile hit beaches in eastern Australia on Sunday, witnesses and officials said, but there were no initial reports of damage.
5British police arrested a former senior Bosnian leader in London Monday on a Serbian warrant alleging he com-mitted war crimes, to the outrage of Bosnian leaders who said the move undermined Bosnian sovereignty....
β¦
11
Experiment results
Comparison of bursty term detection results with methods proposed by Whitney et al. (2009), Fung et al. (2005), Chen et al. (2007), and He et al. (2005).
Bold terms: bursty terms assumed to be correct. Underlined terms: topical terms. Starred terms: general terms.
12
Experiment results
Comparison of the performance of retrieving documents related with bursty events.
13
Further work
Chi-Square
MICV
KL Divergence
Skewness
Self-Similarity
Chernoff Divergence
Union of βStatistically Sufficientβ Conditions
Bursty terms
14
Conclusion
Focus on identifying bursty terms to detect disaster related bursty events.
Bursty terms can help people in properly reacting in decision critical situations.
Bursty terms can be characterized by using four perspectives.- Skewness, consistency, periodicity, and variation.
The final scoring function to detect bursty terms is proposed. The experiment results showed that the proposed approach is
effective to detect bursty terms compared to the existing alternatives.