+ All Categories
Home > Education > Organizing and implementing on the thesauri mapping project

Organizing and implementing on the thesauri mapping project

Date post: 26-Jul-2015
Category:
Upload: aims-agricultural-information-management-standards
View: 523 times
Download: 2 times
Share this document with a friend
Popular Tags:
62
Organizing and Implementing on the Thesauri Mapping Project Dr. Chang Chun Associate Professor Agriculture Information Institute, Chinese Academy of Agricultural Sciences (AII/CAAS), Beijing China The Seventh Agricultural Ontology Service (AOS) Workshop AFITA 2006 November 9-11, Bangalore, India
Transcript

Organizing and Implementing on the

Thesauri Mapping Project

Dr. Chang Chun

Associate ProfessorAgriculture Information Institute, Chinese Academy of Agricultural

Sciences (AII/CAAS), Beijing China

The Seventh Agricultural Ontology Service (AOS) Workshop

AFITA 2006 November 9-11, Bangalore, India

Outline

Introduction

Organizing

AGROVOC and CAT

Conclusions

Outline7th AOS

Objectives

Methods

Mapping rules

Discussions

Brief Introduction on the Mapping Project

CAT

CAAS

AGROVOC

FAO

ExactMatchInexactMatchBroadMatch

NarrowMatchAND,OR,NOTNo mapping

mapping mapping

Mapping RulesResource Target

7th AOS Introduction

Objective 1:

Enrich AOS Terminology Domain Knowledge

Key words have problems in search information;

Thesauri are still working in information management;

Research on conversion from thesaurus to ontology;

Mapping can add more new domain knowledge.

7th AOS Objective

Objective 2:

Develop Cross-Language Search System

Chinese users

MappingInformation( e, b,n… )

Chinese data

AGRIS data

AGROVOC

CAT

EnglishUsers

Search

Search

Search end

Search end

7th AOS Objective

The Time and Tools of Mapping Project

The time of mapping project: From September 2005 to

September 2006;

Mapping rules: a revision method of SKOS Mapping

Vocabulary Specification;

Mapping direction: from CAT (resource) to AGROVOC (target)

Mapping tools: Protégé , Excel sheet, CAT and AGROVOC

CD-ROM.

7th AOS Organizing

Working Flow

From 2005-09-01 to 2005-11-05: make plans of

mapping methods, prepare and test the mapping data;

From 2005-11-06 to 2006-05-30: the training and

mapping with Excel sheet;

From 2006-06-01 to 2006-09-30: convert the Excel

sheet information to OWL mapping data, Protégé can

read this information.

7th AOS Organizing

The specialists

we organized about 16 agricultural domain specialists in CAAS, many

of them are PhD students, they were chosen based on the domain.

The main domain are biological science, agricultural environmental

science, agricultural meteorology, fertilizer science, horticulture,

forestry practice, plant protection, agronomy, agricultural products

processing and storage and comprehensive utilization, veterinary

medicine, biological control, Industrial technology and equipment,

fishery science, and so on.

Some of them have knowledge of thesaurus.

7th AOS Organizing

AGROVOC and CAT AGROVOC:

27736 English terms: 16769 descriptors, 10967 non descriptors

25060 Chinese terms: 16628 descriptors, 8432 non descriptors

1240 top terms organized in 130 categories (AGRIS/CARIS) includes biological taxonomy and geographical names

CAT: 64638 Chinese terms: 51614 descriptors, 13024 non-

descriptors 51400 descriptors has at least one translation 2332 top terms organized in 40 categories (e.g. crops, etc.) includes biological taxonomy and geographical names

7th AOS Organizing

To Finish the Mapping Work in Two Steps

First, Excel sheet:

We split CAT into 36 documents based on the domain, we use

Excel sheet, try to find all mapping information and input it in the

Excel sheet, all these sheets will be kept as original data;

Second,convert information to OWL document:

After we finish the all Excel sheets, we convert and input these

mapping information into OWL documents, they can be read in

Protégé after import CAT and AGROVOC.

7th AOS Organizing

Excel sheets

A B C D E F G H I J

C-termcode

C-term Relation

A-termcode

A-term

combinerelation

C-revise suggestion

C-comment

A-revise suggestion

A-comment

7th AOS Organizing

Mapping Standards and Methods

Exact Match, Inexact Match ; Broad Match,Narrow Match ; AND ; OR ; NOT ;

7th AOS Methods

Mapping relationships Exact match

SKOS: exactMatch OWL: equivalentTo

Broader/Narrower match SKOS: broadMatch, narrowMatch OWL: subClassOf

OR, AND, NOT operators SKOS: OR, AND, NOT OWL unionOf, intersectionOf, complementOf

Partial equivalences SKOS: minorMatch, majorMatch

7th AOS Methods

Exact Match

CAT AGROVOC

Mapping

Exact Match

Such as ‘: 17147- 禾谷类作物’ Exact Match ‘25512-Cereal crops’

7th AOS Methods

<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_17147_ 禾谷类作物 _Cerealcrop">

<owl:equivalentClass>

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_25512_Cerealcrops_ 禾谷类作物 ">

<owl:equivalentClass rdf:resource="http://www.caas.net.cn/2005/cat#c_17147_ 禾谷类作物 _Cerealcrop"/>

</rdf:Description>

</owl:equivalentClass>

</rdf:Description>

equivalentClass: One of main mapping relation (13105)

7th AOS Methods

Inexact Match

CAT

Mapping

AGROVOC Inexact

Such as :‘经济大国’ Inexact match ‘Developed countries’

7th AOS Methods

55581_ 玉米芯 _Maizecob ie 16171

<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_55581_ 玉米芯 _Maizecob">

<rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string"

>inexact mapping with 16171</rdfs:comment>

</rdf:Description>

Inexact Match : We seldom use this mapping relation

7th AOS Methods

Broad Match

CAT

Mapping

AGROVOC Broad Match

Such as :“ 35234- 普及教育” Broad Match ‘2488-Education’

7th AOS Methods

subClassOf:BroadMatch (another main mapping relation 11408)

<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_35234_ 普及教育 _Universaleducation">

<rdfs:subClassOf rdf:resource="http://www.fao.org/aos/agrovoc/2005#c_2488_Education_ 教育 "/>

</rdf:Description>

7th AOS Methods

Narrow Match

CAT

Mapping

AGROVOC Narrow Match

Such as :“ 8341_ 岛屿 _Islands” Narrow Match “695_Atolls_ 环礁”

7th AOS Methods

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_695_Atolls_ 环礁 ">

<rdfs:subClassOf rdf:resource="http://www.caas.net.cn/2005/cat#c_8341_ 岛屿 _Islands"/>

</rdf:Description>

subClassOf: Narrow Match (173)

7th AOS Methods

AND ; OR ; NOT

AND

“59683- 自动标引” Exact Match ‘11729-Indexing of information’ AND ‘15855 -Automation’

OR NOT

“7536_ 大麦 _Barley” Exact Match ‘823_Barley_ 大麦 OR 3662_Hordeum vulgare_ 大麦植物’

‘12114- 非传染性病害’ Exact match ‘5962-Plant diseases’ NOT ‘34024-Infectious diseases’

7th AOS Methods

AND

“59683_ 自动标引 _Automaticindexing” Exact Match

11729_Indexingofinformation_ 信息编目 and 15855_Automation_ 自动化

7th AOS Methods

<owl:Class>

<owl:intersectionOf rdf:parseType="Collection">

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_11729_Indexingofinformation_ 信息编目 "/>

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_15855_Automation_ 自动化 "/>

</owl:intersectionOf>

</owl:Class>

AND: intersectionOf

<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_59683_ 自动标引 _Automaticindexing">

<owl:equivalentClass>

<owl:Class>

<owl:intersectionOf rdf:parseType="Collection">

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_11729_Indexingofinformation_ 信息编目 "/>

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_15855_Automation_ 自动化 "/>

</owl:intersectionOf>

</owl:Class>

</owl:equivalentClass>

</rdf:Description>

7th AOS Methods

OR

7536_ 大麦 _Barley” Exact Match

‘823_Barley_ 大麦 OR 3662_Hordeum vulgare_ 大麦植物

Methods7th AOS

<owl:Class>

<owl:unionOf rdf:parseType="Collection">

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_823_Barley_ 大麦 "/>

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_3662_Hordeumvulgare_ 大麦植物 "/>

</owl:unionOf>

</owl:Class>

OR: unionOf

<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_7536_ 大麦 _Barley">

<owl:equivalentClass>

<owl:Class>

<owl:unionOf rdf:parseType="Collection">

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_823_Barley_ 大麦 "/>

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_3662_Hordeumvulgare_ 大麦植物 "/>

</owl:unionOf>

</owl:Class>

</owl:equivalentClass>

</rdf:Description>

7th AOS Methods

NOT

‘12114_ 非传染性病害 _Non-infectiousdiseases’ Exact match

‘5962_Plantdiseases_ 植物病害 ’ AND NOT

‘34024_Infectiousdiseases_ 侵染性病害’

7th AOS Methods

<owl:Class>

<owl:intersectionOf rdf:parseType="Collection">

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_5962_Plantdiseases_ 植物病害 "/>

<owl:Class>

<owl:complementOf rdf:resource="http://www.fao.org/aos/agrovoc/2005#c_34024_Infectiousdiseases_ 侵染性病害 "/>

</owl:Class>

</owl:intersectionOf>

</owl:Class>

NOT: complementOf

<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_12114_ 非传染性病害 _Non-infectiousdiseases">

<owl:equivalentClass>

<owl:Class>

<owl:intersectionOf rdf:parseType="Collection">

<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_5962_Plantdiseases_ 植物病害 "/>

<owl:Class>

<owl:complementOf rdf:resource="http://www.fao.org/aos/agrovoc/2005#c_34024_Infectiousdiseases_ 侵染性病害 "/>

</owl:Class>

</owl:intersectionOf>

</owl:Class>

</owl:equivalentClass>

</rdf:Description>

7th AOS Methods

No mapping: 13867_ 干扰 _Interference

7th AOS Methods

<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_13867_ 干扰 _Interference">

<rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string"

>AGROVOC hasn't this concept</rdfs:comment>

</rdf:Description>

NoMapping: comment

7th AOS Methods

How to get OWL documents

Convert the Excel sheet information to Protégé

(machine convert and human input ), and get OWL

mapping data;

Use the tools of ‘import ontology’, import one domain

of CAT and whole AGROVOC, and input the mapping

relations, after save the working, we can get different

domain OWL documents;

7th AOS Methods

Combine the OWL documents

Delete the top and the end of all OWL documents, then paste

them together,we get the whole middle part of mapping

project;

Create a new OWL document, import whole CAT and

AGROVOC, and save the document;

Insert the whole middle part of mapping project into the upper

document, then we get a whole mapping OWL document, it

works with whole CAT and AGROVOC.

Methods7th AOS

1 Candidate and the True mapping

Conclusions7th AOS

ClassificationExact match b n

e-b-ntotal

Otherrelation

Classificationtotal

Total 13 105 11 408 173 24 686 1 747 25 433

Num. Taxon.Geogr

. Total Action

Match English and Chinese 2 470 1 547 143 4 160 Exact match

Match English but different Chinese 624 546 15 1 187 Match not ensured

Match Chinese but different English 3 297 405 188 3 890 Tentative exact match

Automatic identification of candidate exact matches

The statistics of true mapping matches relation

2 The Series Mapping Knowledge Data Files

Conclusions7th AOS

The contribution include the following documents: (a)   cat_agrovco_mapping.owl; (b)   ag_20051101.owl; (c)   cat_all_u.owl; (d)   agrovoc-zh-revise.xls; (e)   agrovoc-usefor-comment.xls;Users can use Protégé create a new ontology with the data of (a), the machine will ask to import (b) and (c), and then you can open the (a), the open time is a little slow, our computer need about 4 minutes, the computer CPU 3.4, RAM: 1 G. (d) notes the information which need to be revised about the terms of AGROVOC; (e) is the comments about AGROVOC terms

Discussions

No mapping ;

InexactMatch;

Begin from the top term;

Mapping document need work with CAT and AGROVOC;

There are many broadMatch relations;

The comment and the suggestion;

7th AOS Discussions

The Heredity of Mapping Relation

About 60% CAT

concepts obtain

mapping relation with

AGROVOC by

heredity. They

normally follow the

ExactMatch,

BroadMatch (24 513)

7th AOS Discussions

C1 A1

21 22

31 32 33

ExactMatch

BroadMatch

CAT AGROVOC

Different Thesauri with Different Classification

A few concepts

have different

domain trees in

two thesauri,

means different

thesauri have

their own

classification.

7th AOS Discussions

C1 A1

21 22

31 32 33ExactMatch

CAT AGROVOC

21 22

31 32

The Resource and Target

ExactMatch: same concepts;

BroadMatch: Chinese users get more broad concept, or get some useless information;English users get more specific concept, or can’t find all information.

NarrowMatch: the opposite.

CAT has more than 60,000 terms, AGROVOC has only about 30,000 terms, so take CAT as resource is better.

7th AOS Discussions

C1 A1

21 22

31 32 33

ExactMatch

BroadMatch

CAT AGROVOC

A4NarrowMatch

Discussions 2

Different knowledge taxonomy ;

Difference on noun and verb ; Different social ideas ; Different cultures ; Different translations.

7th AOS Discussions

Chinese Academy of Agricultural Sciences (CAAS)

andFood and Agriculture Organization (FAO)

[email protected] [email protected]

Thank you

7th AOS Thanks


Recommended