+ All Categories
Home > Documents > New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR...

New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR...

Date post: 01-Apr-2015
Category:
Upload: lillian-rudge
View: 216 times
Download: 2 times
Share this document with a friend
34
New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department Bilkent University
Transcript
Page 1: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

New Event Detection & Tracking

ÖZGÜR BAĞLIOĞLUSÜLEYMAN KARDAŞH. ÇAĞDAŞ ÖCALAN

ERKAN UYAR

Bilkent Information Retrieval GroupComputer Engineering Department

Bilkent University

Page 2: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 2

Outline

Introduction– What is New event detection, tracking system– Motivation

Related Work– TDT– Google News– NewsInEssence

Proposed System– Test Collection Preparation(TTracker),– Novelty Detection & Event Tracking– C3M concept– Design Details

Future Work– Named Entities with NED

• Conclusion

Page 3: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 3

Introduction

Event– Time, space

Topic– Seminal event or activity

The differences“Computer virus detected at Biritish Telecom, March 3, 1993 is an Event”

“Computer virus outbreaks” is a topic

Page 4: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 4

Introduction

New event detection: is the task of detecting stories about previously unseen events in a stream of news stories.

– Airplane crash, earthquake, governmental elections, and etc.

Properties of New EventWhen the event occurred

Who was involved

Where it took place

How it happened

Impact, significance or consequence of the event

Page 5: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 5

Introduction

• Information filtering system – uses a long-lived profile of a user’s request to identify relevant

material in a stream of arriving documents. – In contrast, new event detection has no knowledge of what

events will happen in the news, so must operate without a pre-specified query.

NEDT usage areasIn categorization system

For people who need to know latest news, • govermental analyst, financial analyst, stock market traders

– Identifying new mails from previous ones

Page 6: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 6

Related Work

Topic Detection and Tracking (TDT)Researching since 1997Broadcast news, written and spoken news stories in multiple languagesResearch Area

• Story Segmentation - Detect changes between topically cohesive sections

• Topic Tracking - Keep track of stories similar to a set of example stories

• Topic Detection - Build clusters of stories that discuss the same topic• First Story Detection - Detect if a story is the first story of a new,

unknown topic• Link Detection - Detect whether or not two stories are topically linked

Page 7: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 7

Related Work

Google NewsA novel approach to News

Uses 4,500 English news sources worldwide

Groups similar stories together

Displays them according to each reader's personalized interests.

Page 8: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 8

Related Work

NewsInEssenceSince 2001

Summarizing clusters of related news articles from multiple sources on the Web.

Developed by the CLAIR group at the University of Michigan.

Being partially funded by the NSF under the ITR program, grant number ITR-0082884.

Page 9: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 9

Proposed System

Handling of Test data (Milliyet, TRT, Zaman, Haber7, Cnnturk)– Distribution of the data among collections– Processing the raw data

Test Collection Preparation (TTracker)– Profiles and its properties– Sample profiles from collection

Novelty Detection & Event Tracking– C3M Concept – Algorithm details

Future Work– Named entities– System evaluation

• Conclusion

Page 10: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 10

Handling of Test Data

Data is collected from 5 different sources;– CNN Türk (http://www.cnnturk.com),– Haber 7 (http://www.haber7.com),– Milliyet Gazetesi (http://www.milliyet.com.tr)– TRT (http://www.trt.net.tr),– Zaman Gazetesi (http://www.zaman.com.tr).

• From these sources news of 2005 are crawled which has time stamps (date and time).

Page 11: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 11

Handling of Test Data

Each source is the representative of different angle of view;

– CNN Türk – It is international, American style – TRT – It is governmental, more restrictive– Milliyet Gazetesi – It has modern perspective– Zaman Gazetesi – It is conservative– Haber 7 – It provides variety

• Hence, different perspectives provides nice challenge while tracking the news.

Page 12: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 12

Handling of Test Data

Statistics about sources;

After crawling the data, the text is cleaned from html tags by using HTMLParser library.

199.56100.0225,580All

96.7619.042,749Zaman Gazetesi

120.758.519,102TRT

218.3432.172,506Milliyet Gazetesi

237.8526.359,304Haber 7

270.5714.231,919CNN Türk

Avarage News Length (no. of words)

% Addition to Total News

No. of NewsNewsSource CNN Türk

Haber 7

Milliyet

TRT

Zaman

Page 13: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 13

Test Collection Preparation TTracker

TTracker is a sub-component to collect the test and training data semi-automatically.It is based on an information retrieval system.This system is allowed define the profiles and its tracking news.The system is also provides some statistical information about the profiles.Success of the system will also be compared with manual tracking.

Page 14: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 14

Test Collection Preparation TTracker

Profile contents as follows;– Topic Title: One or two word definition.– Seminal Event: Definition with at most two or three sentences.– What: What happened during the event.– Who: Who involved the event.– When: When the event occurs.– Where: Where the event occurs.– Topic Size: Estimated number of tracking news.– Seed: Seed document of the event.– Event Type: Category of the event.

Page 15: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 15

Test Collection Preparation TTracker

Defining the tracking news in five stages;– Stage 1: Using seed document as a query.– Stage 2: Using event profile as a query.– Stage 3: Using tracking news as query.– Stage 4: Creative query searching.– Stage 5: Quality control of the profile.

• After these stages are completed the quality of the profiles are also controlled by administrators.

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Create

Start

Finish

Page 16: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 16

Test Collection Preparation TTracker

In the stages annotators has right to define the news as “tracking”, “non-tracking”, “not-sure”, “not-evaluated”.

Annotators are evaluating;200 documents for the 1st stage,

300 documents for the 2nd stage,

400 documents for the 3rd stage,

200 documents each for the queries of 4th stage.

Page 17: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 17

Test Collection Preparation TTracker

Until now, we collect nearly 60 completed profile with valuable contrubiton of our friends.

We give extra importance not to occur bias in the collection. Number of profiles of a person, event types, profile lengths are all kept in balance.

Time-SpendNot-EvaluatedNot-SureNon-TrackingTrackingRetireved

825614377614541129Max.

2000142221Min.

13077137889546Avg.

Page 18: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 18

Test Collection Preparation TTracker

Example profiles and their life-time statistics;

481535308141Formula 1 Türkiye Grand Prix9

1163253279942005 Eurovision Şarkı Yarışması8

33172188206241231Özbekistan’da kanla bastırılan isyan7

14265299353329Fransa’nın AB anayasasını referanduma götürmesi6

013694241110Live 8 konserlerinin G-8 zirvesine etkisi5

110138147166270179Kırgızistan’da kadifemsi “devrim”4

658166221330318Suriye’yi Lübnan’dan çıkaran suikast3

345276287288291Papa 2. Jean Paul, hastalığı ve ölümü2

185244273298304329Sahte Rakı1

n=10n=25n=50n=100

No. of Tracking News in n DaysLife-Tine (day)

 No. of Tracking News

 News TitlePro. No

Page 19: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 19

Test Collection Preparation TTracker

Distribution of news in the year for two sample profiles which are generated by using TTracker;

Sahte Rakı

0

20

40

60

80

2005 Eurovision Şarkı Yarışması

0

2

4

6

8

Days of 2005Days of 2005

Ne

ws

am

ou

nt

Ne

ws

am

ou

nt

Page 20: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 20

Test Collection Preparation TTracker

To prepare this system, we used information retrieval system – semi automatic;

TTracker’s recall value will be compared with the manual system recall value (=1).

By using T-test, correctness of the system would be measured.

Page 21: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 21

Proposed System

Novelty Detection & Event Tracking

Novelty detection – the identification of new data that a machine learning

system is not aware of during training. – one of the fundamental requirements of a good

classification or identification system.

Page 22: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 22

Proposed System

A special case of novelty detection...

0

time

First Event

Tracking Events

Old News

Now

Window

Page 23: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 23

Proposed System

Cover Coefficient Based Clustering Methodology(C3M) [Can F., Ozkarahan E.1990]

Single pass seed algorithm

Working principles are:• Determining number of clusters• Determining cluster seeds• Assigning other documents to clusters initiated by seeds

– Two stage probability experiment is performed

Page 24: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 24

Proposed System

• C3M CONCEPT – Example D(Document Term) and C(cover coefficient) matrixes

– Cij=αi* ∑dIK*βK*dJK for k=1 to m

Page 25: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 25

Proposed System

NEDT using C3M Concept:Threshold value δW (for new event detection) depends:

Window size

Cii of incoming event

Cij of incoming event

to other events in window

• δG depends:– Cluster centroid similarity(CIJ)– Cii of incoming event

Page 26: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 26

Proposed System

Two thresholds should be found:– In window – In collection

• A possible selection for high in window but complicated and found by some experimental trials intuitionally...

• Results are as follows:

ln max ( )j W ij iik W C C

Page 27: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 27

Proposed System

Some experiments will be conducted to improve threshold according to:-Some pattern recognition techniques such as

• Mixture of Gaussian• SVM• Decision Trees

Another problem about threshold finding:

– dataset is not large enough– only 2 feature available

Note:Blue dots: New EventGreen dots: Tracking event

X axis: Cii Y axis:Cij

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Page 28: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 28

Future Work

Improving NED => Using Named EntitiesTopic-conditioned novelty detection (Yang, ..., 2002)

A new similarity measure with semantic classes (Makkonen, ..., 2002)

Modified similarity metrics (Kumaran and Allan, 2004)

Using names and topics (Kumaran and Allan, 2005)

Page 29: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 29

Future Work

Intuition behind named entities:– Who, Where, When– People, organization, places, date and time

How to embed named entities into NEDA new similarity matrix

Additional similarity comparison with extracted named entities

Page 30: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 30

Future Work

Evaluation of the NEDJudge documents

Select random documents from different categories

Annotators judge documents

Same documents are used by our system

Finally, evaluation is done according to precision and recall considering annotators’ judgements

Page 31: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 31

Future Work

Developing an– effective– real-time

Web application capable of detecting new events

tracking old ones

Page 32: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 32

Conclusion

Mention about– New Event Detection and Tracking Concepts– Test collection preparation– Details of designed system

Goal:– Perform a leading research in Turkish– Make real of dreams in Information Retrival– “Rising like a sun in the science world” Fazli Can

Page 33: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 33

References

Can F. and Ozkarahan, E. A. “Concepts and effectiveness of the cover coefficient based clustering methodology for text databases”. 1990.

Kumaran G. and Allan J. “Text classification and named entities for new event detection”. 2004.

Makkonen J., Ahonen-Myka H., and Salmenkivi M. “Appliying semantic classes in event detection and tracking”. 2002.

Yang Y., Zhang J., Carbonell J., and Jin C. “Topic-conditioned novelty detection”. 2002.

Page 34: New Event Detection & Tracking ÖZGÜR BAĞLIOĞLU SÜLEYMAN KARDAŞ H. ÇAĞDAŞ ÖCALAN ERKAN UYAR Bilkent Information Retrieval Group Computer Engineering Department.

22/03/07 First Event Detection & Event Tracking 34

Questions?

Thanks for your patience...

Any questions?


Recommended