Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azman from FSTM, UKM

Post on 03-Jul-2015

242 views 0 download

description

Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azman from FSTM, UKM Presentation for MyREN Seminar 2014 Berjaya Hotel, Kuala Lumpur 27 November 2014

transcript

Application of Ontology in

Semantic Information Retrieval

Presentation for MyREN Seminar

Berjaya Hotel, Kuala Lumpur

27 November 2014

1

Brief speaker’s info

2

Shahrul Azman Mohd. Noah, Ph.D.Knowledge Technology Research GroupCenter for AI Technology (CAIT)shahrul@ukm.edu.my

Graduated in BSc(Mathematics) from UKM

Graduated in MSc(IS) from Sheffield U.

Graduated in PhD(IS) from Sheffield U. –

knowledge-based systems

From Muar, Johor

ONTOLOGY

5

What is ontology?

• Ontology may be considered as a kind of method to represent knowledge.

• From a philosophical discipline – the science of “what is”; the kinds and structures of objects, properties, events, processes and relations in every area of reality.

• Aristotle classification of animals is one

the first ontology developed.

6

Ontology in Computing

• An ontology is an engineering artifact: – It is constituted by a specific vocabulary used to describe a

certain reality, plus

– A set of explicit assumptions regarding the intended meaning of the vocabulary.

• Thus, an ontology describes a formal specification of a certain domain:– Shared understanding of a domain of interest

– Formal and machine manipulable model of a domain of interest

7

8

Ontology Definition

Formal, explicit specification of a shared conceptualization

commonly accepted

understanding

conceptual model

of a domain

(ontological theory)

unambiguous

terminology definitions

machine-readability

with computational

semantics

[Gruber93]

Source: Smith & Welty (2001)

a catalog

a set of

text files

a glossary

a thesaurus

a collection of

taxonomies

a set of

general logical

constraints

a collection of

frames

Complexity

An ontology is…

9

Various approaches to classify ontologies

10

Classify ontologies according to the information

the ontology needs to express and the richness

of its internal structure (Lassila & McGuiness,

2001)

Classify into 2 orthogonal dimensions: the amount

and type of structure and the subject (Van Heijst et

al., 1997)

Classify ontologies according to their level of

dependence on a particular task (Guarino, 1998)

Ontology language

• Ontology languages are formal languages used to construct ontologies – allow the encoding of knowledge about specific domains and often

– include reasoning rules that support the processing of that knowledge

• Various languages have been proposed: CycL, KL-One, Ontolingua, F-Logic, OCML, LOOM, Telos, RDF(S), OIL, DAML+OIL, XOL, SHOE, OWL etc.

• Usually based on Description Logic (DL).

• Summarised as (Kalibatiene & Vasilecas, 2011):

11

Example of ontologies

• Top level ontology -

12

Suggested Upper Merged Ontology (SUMO

13

Portion of SUMO ontology with

USGS Geo-concepts inserted

Example of ontologies (cont.)

• Lexical ontology - Wordnet

14

Example of ontologies (cont.)

• Domain ontology - Simple News and Press Ontologies

(SNaP)

15

Linked Data…?

16

Applications of ontology

• Searching & browsing

• Decision support system

• Question answering system

• Recommendation

• Data integration

• Etc.

17

INFORMATION RETRIEVAL

18

Concepts

• “Information retrieval (IR)is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968).

• Applications of IR: recommendations, Q&A, filtering… and of course searching.

20

Issues in IR

• Some issues in IR:

– Relevance

– Evaluation

– Users and information needs

• Context based search

• Semantic search

• Etc.

21

IR process

22

ONTOLOGY + INFORMATION RETRIEVAL

23

Ontology and semantic search

• Various ways to support semantic search:

– Query expansion –users query are expanded with related

terminological terms

– Disambiguation – resolving terms or concepts when they

refer to more than one topics

– Classifying – classify documents such as ads into

ontological topics to support semantic search

– Enhanced IR model – embed ontology into existing IR

model resulting a modified IR model

25

Query Expansion

• Query expansion (QE) is needed due to the

ambiguity of natural language.

• Main aim of QE – to add new meaningful terms to

the initial query.

26

Bhogal, J., Macfarlane, A. & Smith, A. 2007. A review of ontology based query expansion. Information

Processing and Management, 43: 866-886.

Query Expansion

27

Semantic index

• Textual documents are indexed according to some ontology model.

• Remember the concept of vocabulary in IR?

31

architecture

bus

computer

database

….

xmlcomputer science

collection index terms or vocabulary

of the collection

IndexingExtract

Semantic index

• Textual documents are indexed according to some ontology model.

• Remember the concept of vocabulary in IR?

32

computer science

collection Replace the index with ontological-index

IndexingExtract

architecture

bus

computer

database

….

xml

Examples

• Three research projects that illustrate the

applications of ontology-based IR:

– Semantic digital library

– Crime news retrieval

– Multi modality ontology-based image retrieval

35

Semantic digital library

• Proposed an approach for managing, organizing and populating ontology for document collections in digital library.

• The document metadata and content are inserted and populated to a knowledge base which allows sophisticated query and searching.

• Firstly to propose an ontology based information retrieval model which is based on the classic vector space model which includes document annotation, instance-based weighting and concept-based ranking.

36

Semantic digital library

• General architecture

37

Semantic digital library

• Involved three

ontologies – ACM

Topic hierarchies,

Geo ontology and

Dublin core

metadata

• Portion of domain

ontology focusing

on academic thesis

38

Semantic digital library

• Document

annotation

39

Semantic digital library

• The process

40

VSM Index #create Class Person

#create instance of Class Student

<Student rdf:ID="Student1">

<rdfs:label>Arifah Alhadi</rdfs:label>

</Student>

<Student rdf:ID="Student2">

<rdfs:label

rdf:datatype="http://www.w3.org/2001/XMLSchema#string"

>Asyraf Arifin</rdfs:label>

</Student>

#Create Instance of Class Supervisor

<Supervisor rdf:ID="Supervisor1">

<rdfs:label>PM Dr Shahrul Azman</rdfs:label>

<rdfs:label>Prof. Madya Dr. Shahrul Azman Mohd

Noah</rdfs:label>

</Supervisor>

<Supervisor rdf:ID="Supervisor2">

<rdfs:label>Prof Aziz Deraman</rdfs:label>

</Supervisor>

Concept Instance Document

s

http://www.ukm.my/thesis/supervisor#

http://www.ukm.my/thesis/person#Supervisor1 Doc1

http://ukm.my/thesis/student#

http://ukm.my/thesis/creator#

http://ukm.my/thesis/person#

Student1 Doc1

http://ukm.my/thesis/student#

http://ukm.my/thesis/creator#

http://ukm.my/thesis/person#

Student2 Doc1

Id Term TFIDF Frq Doc

Id

1 Arifah Alhadi 0.11 2 Doc1

2 Asyraf Arifin 0.123 1 Doc1

3 PM Dr Shahrul

Azman

0.45 1 Doc1

Ontology-based IR for crime news retrieval

• Each crime news must be classified into categories: Traffic Violation, Theft, Sex Crime, Murder, Kidnap, Fraud, Drugs, Cybercrime, Arson and Gang (Chen et al. 2004)

• Useful entities need to be identified: Person, Location, Organisation, Date/Time, Weapon, Amount, Vehicle, Drug, Personel properties, and Age.

• Clustering of crime news into topics, e.g. Nurin Jazlin murder, Canny Ong, Sosilawati etc.

• Clustering of specific topic into various

and chronological events.

• Mapping of named entities into news

ontology to support semantic querying and retrieval.

42

Example

43

Murder Kidnap Theft Gang

Nurin Jazlin Sosilawati Canny Ong

Investigation into Canny Ong case

include medical report and trialEvidence/Suspect into Canny

Ong caseDNA test

Family reacts into Canny Ong and

negligence suitCourt Sentence, plead guilty

(17) (6) (3) (9)(13)

………………..

Classification

Clustering

Cluster into topics

Required methods

• In order to support the aforementioned

requirements:

– Conventional text processing - tokenizing, indexing,

stopping, stemming etc.

– Named entity recognition (NER)

– Classification and clustering

– Ontology mapping

44

46

PRE-PROCESSING TASK

DOCUMENT REPRESENTATION

DOCUMENT ORGANIZATION

+

+

• Stopword removal

• Stemming

• Parsing

• Indexing

• Bag of words

• Named entity

recognition

• Classification

- AdaBoost

• Clustering –

KNN

• Semantic

mapping

Document representation

• Documents will be presented into meaningful

forms:

– BoW – Bag of Words

– Named Entity Recognition – used the GATE Annie and

Jape rules

– Adopt the Vector Space Model (VSM) but enhanced with

ontological model

48

Document representation

49

Document organization

• Documents need to be organised into categories,

topics and events.

– Classification – Adaboost algorithm

– Clustering – Used the KNN clustering

– Ontology mapping – we have develop a crime news

ontology by extending the existing SNaP ontology.

Includes classes/entities which are important to crime

such as classification of crimes, location and weapon.

50

51

Asset ontology

Event ontology

Extending the SNaP ontology and

mapping to entities in news documents

52

SNaP

Crimepne:Event

pna:Asset

pns:Stuff

pns:Tangible

pns:Organizationpns:Location

pns:Person

event:Event

rdfs:subClassOf

rdfs:subClassOf

rdfs:subClassOf

pns:Weapon

pns:Vehicle

pnc:Classification

<Murder><Kidnap>

rdf:typerdf:type

rdfs:subClassOf

pne:

subeventOf

rdfs:domain

rdfs:range

<Event 1>

rdf:type

pnt:Tag

rdfs:subClassOfrdfs:subClassOf

pnc:Classifiable

pnc:

isClassifiedBy

rdfs:subClassOf

rdf:domain

rdf:range

rdfs:subClassOf

rdfs:subClassOf

rdfs:subClassOf

The Application

• What we need/desire.

53

Ontology-based Image Retrieval

• Rapid growth of visual information (VI) – lead to difficulty in finding and accessing VI.

• Inability to capture the semantic content.

• Problem arise – lack of coincidence between information extracted from VI and user needs.

• Conventional approaches of image retrieval (IMR) - TBIR and CBIR have reached their limit in attempting to solve this problem.

• As a result – SBIR approach,

ontology-based provide an explicit

domain oriented semantic for

concept and relationship.

55

Ontology-based Image Retrieval

• Illustrate how images are describes based on it

visual, textual and domain semantic features.

• Proposed a multi-modality ontology: visual

ontology, textual ontology and domain ontology.

• Illustrate how such ontology can be integrated with

open source knowledge base (DBpedia) to support a

more comprehensive search.

56

Proposed Approach

57

Example of multi-modality ontology

58

Example of Multi-modality ontology with

DBpedia

59

Conclusion - Practical implementation of

ontology-based IR

60

TBox

ABox

Ontology

Documents

Index

Extractionbuild

Population

Annotation

Query

Processing

query

ranked docs

Research issues

• Index representation – most still based

on the conventional VSM.

• Ranking – weighting and ranking

mechanisms

• Automatic population – supervised and

unsupervised

• Extraction & annotation

• Multilingual and cross-language

61

References

• Castells, P., Fernandez, M.,Vallet, D. 2007. An Adaptation of Vector Space Model for Ontology Based Information Retrieval. IEEE Transaction on Knowledge and Data Engineering, 19(2):

• Shahrul Azman Noah, Nor Afni Raziah Alias, Nurul Aida Osman, ZuraidahAbdullah, Nazlia Omar, Yazrina Yahya, Maryati Mohd Yusof: Ontology-Driven Semantic Digital Library. AIRS 2010: 141-150.

• Shahrul Azman Noah, Datul Aida Ali: The Role of Lexical Ontology in Expanding the Semantic Textual Content of On-Line News Images. AIRS 2010: 193-202.

• Fernández, M., Cantador, I., López, V. , Vallet, D., Castells, P., & Motta, E. 2011. Semantically enhanced information retrieval: an ontology-based approach. Web Semantics: Science, Services and Agents on the World Wide Web, 9: 434-452.

• Kara, S. Alan, O., Sabuncu, O., Akpınar, S., Cicekli N.K., & Alpaslan, F.N. 2012. An ontology-based retrieval system using semantic indexing. Information Systems, 37: 294-305.

• Kohler, J., Philippi, S., Specht, M., & Ruegg, A. 2006. Ontology based text indexing and querying for the semantic web. Knowledge-Based Systems, 19: 744-754.

• Etc.

62

Example - advanced application of

ontology

64

Watson – the science behind an answer

65

66

1 2 3 4

5 6 7 8

9 10 11

Group members:

1. Shahrul Azman Mohd. Noah

2. Juhana Salim

3. Masnizah Mohd

4. Nazlia Omar

5. Mohd Juzaiddin Ab Aziz

6. Nazlena Mohamad Ali

7. Saidah Saad

8. Shereena Mohd Arif

9. Lailaltulqadri Zakaria

10. Sabrina Tiun

11. Maryati Mohd. Yusof

END

67