+ All Categories
Home > Documents > SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2...

SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2...

Date post: 03-Mar-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
21
SMALL TEXTS FOR BIG DATA Contributors L.Petit, S. de Amo, S.Bras Claudia Roncancio & Cyril Labbé Université Grenoble Alpes Laboratoire LIG France AMW 2014 Alberto Mendelzon International WS on Foundations on Data Management From Cartagena de Indias... 2
Transcript
Page 1: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

SMALL TEXTS FOR BIG DATA

Contributors L.Petit, S. de Amo, S.Bras !

Claudia Roncancio & Cyril Labbé Université Grenoble Alpes Laboratoire LIG France

1 AMW 2014 Alberto Mendelzon International WS

on Foundations on Data Management

From Cartagena de Indias... 2

Page 2: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

3 Rentrée 2008 - 1A

to the French Alps…

Grenoble, capitale des Alpes 4

Page 3: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

SMALL TEXTS FOR BIG DATA

Contributors L.Petit, S. de Amo, S.Bras !

Claudia Roncancio & Cyril Labbé Université Grenoble Alpes Laboratoire LIG France

5 AMW 2014 Alberto Mendelzon International WS

on Foundations on Data Management

Ubiquitous Information Systems

!  Information Systems (IS) anywhere and anytime ! Content provided by and/or managed by mobile or

embedded devices ! Deployed in “offices” and in “real-world” environments ! Users and devices may move

!  Large-scale distributed information systems !  Increased maturity of networks, hardware, data

management… !  Users expect appropriate information management

for “every thing”

6

Page 4: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Data sources in UIS

!  Heterogeneous sources of persistent data ! DBMS, FS,…

!  Heterogeneous sources of data streams ! Devices, sensors, application… Almost every thing ! A data stream is a (potentially unbounded) sequence

of data items

7

Data sources in UIS

!  Heterogeneous sources of persistent data ! DBMS, DFS,…

!  Heterogeneous sources of data streams ! Devices, sensors, application… Almost every thing ! A data stream is a (potentially unbounded) sequence of

data items “Transactional” data streams

"  Stock exchange, web access, telecommunication, social data Measurement data streams

"  Monitoring the evolution of entity states

!  “Data never sleeps”… Big data!

8

Page 5: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Taking advantage of such data…

!  Many information-based services !  Loosely-coupled cooperation between services !  User-centered point of view?

! Not so easy!

!  Service personalization, ever and ever

9

This talk

!  Information extraction customization ! To fit user’s current profile and interests ! To adapt content & form

!  Personalization and context awareness ! User preferences and query answer customization ! Answers on available user devices.

10

Page 6: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

This talk

!  Information extraction customization !  To fit user’s current profile and interests !  To adapt content & form

!  Personalization and context awareness !  User preferences and query answer customization !  Answers on available user devices

!  Ad-hoc abstracts of data to facilitate stream data monitoring !  Contextual preferences

!  Short texts which summarize (in natural language) the result of continuous complex data monitoring !  Shared in social networks !  Delivered to personal devices in various context // listen to summaries

while driving !  Facilitates monitoring, even for disabled users.

11

Data monitoring

12

Page 7: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Personalized summaries 13

Personalized summaries

Continuous queries & contextual preferences

14

Summaries in natural language

Page 8: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Running example: model & schema 15

Running example: continuous queries

[Q1] Every day, a summary concerning the stock Total over the last two days. [Q2] Every hour, a summary, over the last hour, for the category IT [Q3] Each hour, summary of the last hour, of the 100 transactions I prefer concerning category ’IT’.

16

Page 9: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Astral stream algebra

!  Continuous and one-shot queries on persistent and real-time data ! streams and (temporal) relations

!  Streamer Is(R) - stream of tuples inserted in R !  Windows (positional, temporal, cross) S[L] - last arrived tuple of a stream S S[N slide d] - sliding window of size N

17

Astral stream algebra

!  Streamer Is(R) - stream of tuples inserted in R !  Windows (positional, temporal, cross) S[L] - last arrived tuple of a stream S S[N slide d] - sliding window of size N !  Joins and semi-sensitive Join

18

(Petit 2012)

Page 10: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Astral stream algebra - example

[Q1] Every day, a summary concerning the stock Total over the last two days.

19

Aggregations for summaries 20

Page 11: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

21

Customization & preferences

!  How to express user preferences? !  What do I prefer? !  User preferences may depend on the context… !  Any thing that can have an influence on the choice

Running example: continuous queries

[Q1] Every day, a summary concerning the stock Total over the last two days. [Q2] Every hour, a summary, over the last hour, for the category IT among the 100 transactions which most fulfill my preferences [Q3] Each hour, summary of the last hour, of the 100 transactions I prefer concerning category ’IT’.

22

Page 12: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Running example: user preferences

!  Personalized ranking [P1] For stock options in category Commodities, Luc prefers a volatility-rate < 0.25. But for category IT, Luc prefers a volatility-rate > 0.35. [P2] For stock options with volatility > 0.35, Luc prefers those from Brazil than French ones. [P3] For stock options with volatility < 0.35, Luc prefers transactions involving at least 1000 shares.

23

Contextual preferences

When condition

is true

I prefer over

with attributes

equal (ceteris paribus)

U = P1(Y1), … , Pn(Yn) Pi unary predicate Yi not in W Eg. Yi > 5

Qi unary predicate Q1(X) � Q2(X) = ∅ Eg. (X < 3) > (X > 4)

24

CP – rules:

(De Amo 2011, Wilson 2004)

Page 13: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Contextual preferences to rank data

[P1] For stock options in category Commodities, Luc prefers a volatility-rate < 0.25. But for category IT, Luc prefers a volatility-rate > 0.35.

[P2] For stock options with volatility-rate 0.35, Luc prefers those from Brazil than French ones.

Formal expression cp-theory over the schema

T (SOName,Cat,Country, ETime, Rate, Method)

25

Preferences & the algebra

!  One cp-rule φ , binary relation !  A cp-theory !  Induces a partial order !  Preferences on any data (streams or relations) !  Preference operators

! Best – non dominated tuples ! Kbest – top-k

26

(Petit 2013)

Page 14: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Data sample

D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1 d2 AP IT India t2 0.55 M1 d3 USSteel Commodities USA t1 0.20 M2 d4 Petr4 Commodities Brazil t2 0.40 M2 d5 Bel5 Investments France t3 0.55 M2

27

Data sample

D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1 d2 AP IT India t2 0.55 M1 d3 USSteel Commodities USA t1 0.20 M2 d4 Petr4 Commodities Brazil t2 0.40 M2 d5 Bel5 Investments France t3 0.55 M2

d2

d1

d3

d4

d5

φ2

φ1

φ3

28

Page 15: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Preferences & the algebra

!  One cp-rule φ implies a partial order !  A cp-theory imply a partial order !  Preferences on any data (streams or relations) !  Preference operators

! Best – non dominated tuples ! Kbest – top-k

29

Personalized summaries

Summaries in natural language

30

Page 16: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Stream2text: Text generation

���������������� ��

��� �����

����������������

����������������

�������������������

�������������������

�������������������

�������������������

���������

�������������������������������

31

Natural language generation

!  SimpleNLG, library used to generate grammatically correct sentences (Gatt&Reiter, U. Aberdeen)

! SimpleNLG-EnFr English & French (Vaudry, U. Montreal)

!  Realiser for a simple grammar ! Orthography ! Morphology: handling inflected forms, gender, tense,

number or person. ! Ensuring grammatical correctness: enforcing noun-verb

agreement, creating well-formed verb groups…

32

(Gatt 2009)

Page 17: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Text generation - example 33

Dictionary of Concepts 34

Page 18: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Dictionary of aggregation functions 35

Transcription operator 36

Page 19: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Transcription operator

Text organization (microplanning) •  Performed by the transcriptionist •  One paragraph per Entity •  One sentence per value of the structured summary

37

Example of summary 38

Page 20: SMALL TEXTS FOR BIG DATA - University of Chilejperez/amw2014/...Bel5 Investments France t3 0.55 M2 27 Data sample D SOName Category Country ETime Rate Method d1 MS IT USA t1 0.30 M1

Conclusion and future work

!  Personalized summaries for complex data monitoring ! Streams & persistent data ! Contextual user preferences ! Text summary in natural language (French & English) ! …Text streams!

!  Textual summaries facilitate ! Sharing in social networks ! Access through mobile devices ! Using text-to-speech software

39

Conclusion and future work

!  Experimentation (stock exchange, NBA) ! Assessing the text sentence aggregation !  Text updates

!  Social streams - Sentiment analysis ! Analysis of small texts (phrases, tweets, SMS messages !  Identify a polarity (positive, negative or neutral) ! More work to summarize the global sentiments !  To give a (fair and non-biased) picture of the global

sentiments !  Complex events

!  Production of texts referring to present and past situations

40


Recommended