+ All Categories
Home > Data & Analytics > Alessio Bosca: Linked Data for Content Analytics in CELI

Alessio Bosca: Linked Data for Content Analytics in CELI

Date post: 07-Jul-2015
Category:
Upload: mbruemmer
View: 257 times
Download: 2 times
Share this document with a friend
Description:
Alessio Bosca (CELI) presented how CELI is exploiting linked data. Their focus is on speech applications, semantic search, text analytics, opinion mining and social media intelligence. The core technology used encompasses language processes such as language identification, morphological analyses and semantic analysis. CELI exploits the linked data in the LOD cloud a) as a user by making use of for NER, and b) as a provider for internal use and for crafting RDF artifacts. Two projects were addressed: a book project for the digital humanities and the Homer project for multilingual interfaces to assessing data from different public administration. From the work with linked open data the the LOD cloud community is advised to put more emphasis on truly linking of the datasets. With regard to the public sectors it is suggested that more data should be published as linked open data and that international standards should be used. The issue of publishing companies’ linked data under an open license was also addressed. The speaker made the point that besides the resistance to sharing, because of valid competitive concerns, company data is generally over-fitted to their solutions and clients. In other words, companies need to be able to manage ‘micro-domains’ which are regarded as less useful in general. Compromisingly it was suggested by the audience that companies should not answer the question why they do not publish their linked data, but what they could publish.
Popular Tags:
19
Linked Data for content analytics in Celi Semantics 2014 - Leipzig Alessio Bosca
Transcript
Page 1: Alessio Bosca: Linked Data for Content Analytics in CELI

Linked Data for content analytics in Celi Semantics 2014 - Leipzig Alessio Bosca

Page 2: Alessio Bosca: Linked Data for Content Analytics in CELI

Agenda ü  Presentation of Celi ü  Technologies (and what we do with

them) ü  Focus on LOD for content analytics

in Celi ü  … what we’d like to do

2

Page 3: Alessio Bosca: Linked Data for Content Analytics in CELI

1999 CELI srl was born

1999 2005 2010

2002 Speech Technology

2006 BlogMeter

2013 Korean Market

2011 Cross Library

2010 Milan, Rome,

Trento

3

Page 4: Alessio Bosca: Linked Data for Content Analytics in CELI

4 Seats

Torino Milano Trento Roma

6 Markets

Italy Belgium France Spain Corea Poland

50 Employees + Collaborators

>100 Active clients

4 Business branches

15 Years of experience

NLP components Speech technology Social Media Intelligence Digital Humanities

4

Page 5: Alessio Bosca: Linked Data for Content Analytics in CELI

>50 Published papers

15 Research projects

Relationships with the scientific community

6 Agreements with research centers

Scuola Normale Superiore Università di Torino Università di Pisa Università di Trento Fondazione Bruno Kessler Politecnico di Milano

5

Page 6: Alessio Bosca: Linked Data for Content Analytics in CELI

6

Core technology

opinion mining,

mood and sentiment

analysis

language identification

normalization

tokenization

NSW processing morphological

analysis

disambiguation

chunking and phrasing

phonetic transcription

with word stress

semantic clustering

automatic classification

named entities

Page 7: Alessio Bosca: Linked Data for Content Analytics in CELI

Techs

Guava

Kestrel

Virtuoso OpenSource

7

Page 8: Alessio Bosca: Linked Data for Content Analytics in CELI

8

Clients

Speech Technology Semantic Solutions Social Media Monitoring

Page 9: Alessio Bosca: Linked Data for Content Analytics in CELI

Linked (and/or Open) Data

Linked Data

Open Data

?

LOD

9

Page 10: Alessio Bosca: Linked Data for Content Analytics in CELI

Private Sector: how Celi exploits L(O)D

•  as user LODs as linguistic resources for NER, content enrichment, machine linking, discovery search… •  as provider for the PA publishing, data integration •  internal use (e.g. assets management) •  crafting of RDF artifacts for custom projects and applications

10

Page 11: Alessio Bosca: Linked Data for Content Analytics in CELI

LOD for NER

•  GENDER GUESSER •  LOCATION GUESSER •  ENTITY LINKER •  ETC .

11

INDEXER

DUMP

CELI TRIPLE STORES

INDEXES

Linguistic Analysis

SPARQL QUERIES

SEARCHER

CUSTOM RDF

WEBAPPS

Page 12: Alessio Bosca: Linked Data for Content Analytics in CELI

Faceted Semantic Search

Browse through documents and contents

Relations between Facets

12

Page 13: Alessio Bosca: Linked Data for Content Analytics in CELI

LOD for CLIR

THE AGROVOC THESAURUS HAS BEEN USED IN THE ORGANIC.LINGUA PROJECT FOR ONTOLOGY-BASED CLIR

13

Page 14: Alessio Bosca: Linked Data for Content Analytics in CELI

Sem-web techs for internal models Information in the CRUNCHED BOOK is represented using combinations of RDF and GRAPH DBS

14

Page 15: Alessio Bosca: Linked Data for Content Analytics in CELI

Public Sector: clear process …

acquire data

set open license

open formats publish

15

Celi for the public sector (CSI Piemonte): the Homer project

Page 16: Alessio Bosca: Linked Data for Content Analytics in CELI

(Public sector contd.) … but …

LACK OF MONEY

LACK OF WILLINGNESS

USE OF “STANDARDS”

… hard problems OPAQUE DATASETS

POOR RDF/SPARQL SUPPORT

16

Page 17: Alessio Bosca: Linked Data for Content Analytics in CELI

Why companies’ RDF is not published

HENCE à OVERFITTING:

Provocation It would not be interesting nor usable

WAY OUTS: having more standard models for particular micro-domains could permit their direct (re)use by the private company (and hence the publication of enhanced versions)

•  It reflects customers’ needs •  It reflects internal data models

17

Page 18: Alessio Bosca: Linked Data for Content Analytics in CELI

Receipts

Public Sector: use “true” LOD technologies (RDF dumps and SPARQL endpoints) Private companies: use standard data models, internally and for their artifacts OpenData Community: please stress the linked in LOD!

The success of LOD is bound to the use of Linked Data (as a technology) The use of LD in the Private Sector will positively feedback on the diffusion of the necessary expertise and sensibility in the Public Sector too

18

Page 19: Alessio Bosca: Linked Data for Content Analytics in CELI

Thank You!


Recommended