Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities

Post on 21-Jan-2018

255 views 0 download

transcript

Harvesting Knowledge from Social Networks:

Extracting Typed Relationships among Entities

Andrea Caielli, Marco Brambilla, Stefano Ceri, Florian Daniel

marco.brambilla@polimi.it

marcobrambiSoWeMine Workshop @ ICWE 2017, Rome, Italy

Agenda

(1)Context

(2)Objectives

(3)Method

(4)Experiments and Validation

(5)Visualization and Exploration

(6)Conclusions

(1) Context

Ontology is the philosophical study ofthe nature of being, becoming,

existence or realityand the basic categories of being and their

relations.

Formalizing new knowledge is hard

Only high frequency emerges

The long tail challenge

Sourcing the Long Tail

Famous Emerging

(2) Objective

Objective

Extraction of relationships among entities

Reconstruct a typed graph of entities & relationships

Represent the knowledge contained in social data

No need for a-priori domain knowledge

Knowledge Enrichment Setting

HF Entity1 HF Entity5

HF Entity2 HF Entity4

HF Entity3

LF Entity1??

LF Entity2 LF Entity4

LF Entity3

??

High Frequency

Entities

Low Frequency

Entities

??

??????

??

Type1

Type11

Type2

Type111

InstancesTypes

<<instanceof>>

<<instanceof>>

<<in

stan

ceof

>>

<<instanceof>>

<<instanceof>>

<<instanceof>>

??

??

??

??

??

Seed Entity

Seed TypeType of

interest

Legend

Expert inputs

Enrichment problems

Property2

Relations HF - LF entities

Relations LF - LF entities

Typing of LF entities

Extraction of new LF entities

Property1

?? ?? ??Finding attribute values

A Practical Example

A Practical Example

Challenge and Innovation

Highly unstructured social data (tweets and Facebook posts)

No reliable grammar structures

(3) Method

Analysis Pipeline

(0) Preprocessing

(1) Entity Extraction

(2) Relationship Extraction

(3) Relationship Aggregation

(4) Relationship Typing

(1) Evolution of work presented in:

M. Brambilla, S. Ceri, E. Della Valle, R. Volonterio, and F. Acero Salazar.

“Extracting Emerging Knowledge from Social Media”, WWW 2017.

Pipeline Summary

(0) Preprocessing

Text cleaning and enrichment

+ Traditional text preprocessing (stemming, …)

(1) Entity Extraction

Entity identification and semantic typing

Exploiting:

Stanford CoreNLPNER

Dandelion API

(2) Relationship Extraction

Baseline with Stanford OpenIE for triple extraction:

Several issues:

- Meaningless relations

- Wrong relations

- Multiple relations

(3) Relationship Aggregation

Sails fans. Season 2 airs on May 24th on History on D Stv Jag Comms

Too many answers

for the same question!

Empirical rules

{"entity1":"Season 2",

"relationship":"air on",

"entity2":"May 24th"}

(4) Relationship Typing (A): Synonyms

Exploiting synsets based on WordNet 3.1

(4) Relationship Typing (B): Matching Types

(4) Relationship Typing (C): Linguistics

Based on VerbNet

Groupings of verbs based on syntactic and semantic properties

Pipeline Implementation

(4) Validation

Experiments

TV Series: Black Salis, Teen Wolf, Vikings

Milan Fashion Week

Rugby games

Domains and quality of results -summary

Relationships and Verb Classes

Example: Teen Wolf

0

100

200

300

400

500

600

700

800

Occ

urr

ence

s

Teen Wolf Synonyms Classes

Example: Teen Wolf

0

100

200

300

400

500

600

700

800

Occ

urr

ence

s

Teen Wolf Synonyms Classes

OC

CU

RR

ENC

ES

TEEN WOLF VERBNET CLASSES

Overall Quality Indexes of Entity and Relationships Extraction

(5) Visualization

Motivation

Resulting semantic models extremely large and hard to interpret

Example:

Black Sails collection, containing 1243 entities and 2025 relations.

Exploration

Visualization

Filtering

Navigation

Exploration

Visualization

Filtering

Navigation

Exploration

Visualization

RELATIONSHIP Filtering

Navigation

Examples

Milano Fashion Week

Generated graph

Examples

Milano Fashion Week

Generated graph

Examples

Milano Fashion Week

Generated graph

Examples

Milano Fashion Week

Generated graph

(6) Conclusions

Conclusions

Extraction of relevant emerging relationships feasible even in case of extremely unstructured

and informal content (social media)

Still a long way to perfect extraction:•N-ary relations•Time-dependency•Poor typing of entities in ontologies

THANKS! QUESTIONS?

Andrea Caielli, Marco Brambilla, Stefano Ceri, Florian Daniel

Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities

Marco Brambilla @marcobrambi marco.brambilla@polimi.it

http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi