+ All Categories
Home > Documents > Semantic Illegal Content Hunter - SICH project · Semantic Illegal Content Hunter ... COGITO...

Semantic Illegal Content Hunter - SICH project · Semantic Illegal Content Hunter ... COGITO...

Date post: 07-Feb-2020
Category:
Upload: others
View: 47 times
Download: 0 times
Share this document with a friend
21
Co-funded by the Prevention of and Fight against Crime Programme of the European Union CONFERENZA FINALE PROGETTO SICH Semantic Illegal Content Hunter Laura, Luigi Sapienza Università di Roma Roma, 20 Novembre 2015
Transcript

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

CONFERENZA FINALE

PROGETTO SICH Semantic Illegal Content Hunter

Laura, Luigi – Sapienza Università di Roma Roma, 20 Novembre 2015

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

Realizzazione di un motore di

ricerca ottimizzato per la ricerca

di contenuti illegali.

Luigi Laura

Sapienza Università di Roma

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

SUMMARY

Challenges in the development of SICH:

• A comparison with traditional web search

and search engines

• A comparison with traditional

“Adversarial Web Search”

• The main components of SICH

• Future developments

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

EVOLUTION OF

WEB SEARCH ENGINES First Generation

Use mainly textual information

Second Generation

Need a good global ranking:

hyperlink analysis (Google’s Pagerank)

Third Generation

The need behind the query

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

OPEN SOURCE SEs

There are several good

open source SEs or SEs

components…

Why didn’t we use them?!?

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

IN WEB SEARCH…

… everybody wants to be found, and

ranked high!!

Here we play a different game: the pages

we are looking for do not want to be found

(by us!!!)

Adversarial Web Search?

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

ADVERSARIAL

WEB SEARCH?

Adversarial Web

Search is a nowadays

Mature field of research,

but still it is not what we

need!!!

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

SEARCH ENGINE COMPONENTS

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

THE CRAWLER

The Crawler is the component that is in

charge of collecting the data in the internet.

There are several problems to be resolved:

• one machine is not enough, we need

many…

• ... how do we share the load betweeen

the machines?

• Which pages do we visit first?

• Freshness of the results

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

THE SICH CRAWLER

We used the ESCrawler, the Expert System

Crawler, inside SICH. ESCrawler:

• download documents from the Web;

• search and extract parts of documents;

• generate documents composed by different

documents;

• filter non-core parts of documents;

• populate HTML forms and to get the result;

• Classify documents or parts of them by

calculating a hash signature

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

THE INDEXER

The indexer in the SICH engine is

developed using Expert System’s

COGITO Semantic Technology

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

COGITO

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

THE INTERFACE

In a traditional web search engine the

User Interface is very simple…

both the query...

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

THE INTERFACE

In a traditional web search engine the

User Interface is very simple…

both the query... and the results!

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

THE SICH INTERFACE

It allows:

• linguistic search

• spatial search

• events-based search

• search based on corpora selection

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

THE SICH INTERFACE

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

THE SICH INTERFACE

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

THE SICH INTERFACE

PROJECT SICH – SEMANTIC ILLEGAL CONTENT HUNTER

With the financial support of the Prevention of and Fight Against Crime Programme – ISEC 2012

European Commission – Directorate-General Home Affairs ( HOME/2012/ISEC/AG/INT/4000003863 )

The ESCrawler is in charge to: download documents from the Web; search and extract parts of documents; generate documents composed by different documents; filter non-core parts of documents; populate HTML forms and to get the result; Classify documents or parts of them by calculating a

hash signature

Discovery & Categorization by • Parser • Semantic Network

• Lexicon • Knowledge Base

• Memory Semantic Index Conceptual Map

• linguistic search • spatial search • events-based search • search based on corpora selection Semantic Search; Text Mining; Automatic Categorization; Data Intelligence.

SUMMING UP: THE SICH SYSTEM

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

Contatti

Grazie

Luigi Laura

Sapienza Università di Roma

[email protected]

Co-funded by the Prevention of and Fight against Crime Programme of the European Union

Q&A


Recommended