Date post: | 26-Jul-2015 |
Category: |
Education |
Upload: | aims-agricultural-information-management-standards |
View: | 523 times |
Download: | 2 times |
Organizing and Implementing on the
Thesauri Mapping Project
Dr. Chang Chun
Associate ProfessorAgriculture Information Institute, Chinese Academy of Agricultural
Sciences (AII/CAAS), Beijing China
The Seventh Agricultural Ontology Service (AOS) Workshop
AFITA 2006 November 9-11, Bangalore, India
Outline
Introduction
Organizing
AGROVOC and CAT
Conclusions
Outline7th AOS
Objectives
Methods
Mapping rules
Discussions
Brief Introduction on the Mapping Project
CAT
CAAS
AGROVOC
FAO
ExactMatchInexactMatchBroadMatch
NarrowMatchAND,OR,NOTNo mapping
mapping mapping
Mapping RulesResource Target
7th AOS Introduction
Objective 1:
Enrich AOS Terminology Domain Knowledge
Key words have problems in search information;
Thesauri are still working in information management;
Research on conversion from thesaurus to ontology;
Mapping can add more new domain knowledge.
7th AOS Objective
Objective 2:
Develop Cross-Language Search System
Chinese users
MappingInformation( e, b,n… )
Chinese data
AGRIS data
AGROVOC
CAT
EnglishUsers
Search
Search
Search end
Search end
7th AOS Objective
The Time and Tools of Mapping Project
The time of mapping project: From September 2005 to
September 2006;
Mapping rules: a revision method of SKOS Mapping
Vocabulary Specification;
Mapping direction: from CAT (resource) to AGROVOC (target)
Mapping tools: Protégé , Excel sheet, CAT and AGROVOC
CD-ROM.
7th AOS Organizing
Working Flow
From 2005-09-01 to 2005-11-05: make plans of
mapping methods, prepare and test the mapping data;
From 2005-11-06 to 2006-05-30: the training and
mapping with Excel sheet;
From 2006-06-01 to 2006-09-30: convert the Excel
sheet information to OWL mapping data, Protégé can
read this information.
7th AOS Organizing
The specialists
we organized about 16 agricultural domain specialists in CAAS, many
of them are PhD students, they were chosen based on the domain.
The main domain are biological science, agricultural environmental
science, agricultural meteorology, fertilizer science, horticulture,
forestry practice, plant protection, agronomy, agricultural products
processing and storage and comprehensive utilization, veterinary
medicine, biological control, Industrial technology and equipment,
fishery science, and so on.
Some of them have knowledge of thesaurus.
7th AOS Organizing
AGROVOC and CAT AGROVOC:
27736 English terms: 16769 descriptors, 10967 non descriptors
25060 Chinese terms: 16628 descriptors, 8432 non descriptors
1240 top terms organized in 130 categories (AGRIS/CARIS) includes biological taxonomy and geographical names
CAT: 64638 Chinese terms: 51614 descriptors, 13024 non-
descriptors 51400 descriptors has at least one translation 2332 top terms organized in 40 categories (e.g. crops, etc.) includes biological taxonomy and geographical names
7th AOS Organizing
To Finish the Mapping Work in Two Steps
First, Excel sheet:
We split CAT into 36 documents based on the domain, we use
Excel sheet, try to find all mapping information and input it in the
Excel sheet, all these sheets will be kept as original data;
Second,convert information to OWL document:
After we finish the all Excel sheets, we convert and input these
mapping information into OWL documents, they can be read in
Protégé after import CAT and AGROVOC.
7th AOS Organizing
Excel sheets
A B C D E F G H I J
C-termcode
C-term Relation
A-termcode
A-term
combinerelation
C-revise suggestion
C-comment
A-revise suggestion
A-comment
7th AOS Organizing
Mapping Standards and Methods
Exact Match, Inexact Match ; Broad Match,Narrow Match ; AND ; OR ; NOT ;
7th AOS Methods
Mapping relationships Exact match
SKOS: exactMatch OWL: equivalentTo
Broader/Narrower match SKOS: broadMatch, narrowMatch OWL: subClassOf
OR, AND, NOT operators SKOS: OR, AND, NOT OWL unionOf, intersectionOf, complementOf
Partial equivalences SKOS: minorMatch, majorMatch
7th AOS Methods
Exact Match
CAT AGROVOC
Mapping
Exact Match
Such as ‘: 17147- 禾谷类作物’ Exact Match ‘25512-Cereal crops’
7th AOS Methods
<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_17147_ 禾谷类作物 _Cerealcrop">
<owl:equivalentClass>
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_25512_Cerealcrops_ 禾谷类作物 ">
<owl:equivalentClass rdf:resource="http://www.caas.net.cn/2005/cat#c_17147_ 禾谷类作物 _Cerealcrop"/>
</rdf:Description>
</owl:equivalentClass>
</rdf:Description>
equivalentClass: One of main mapping relation (13105)
7th AOS Methods
Inexact Match
CAT
Mapping
AGROVOC Inexact
Such as :‘经济大国’ Inexact match ‘Developed countries’
7th AOS Methods
55581_ 玉米芯 _Maizecob ie 16171
<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_55581_ 玉米芯 _Maizecob">
<rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>inexact mapping with 16171</rdfs:comment>
</rdf:Description>
Inexact Match : We seldom use this mapping relation
7th AOS Methods
Broad Match
CAT
Mapping
AGROVOC Broad Match
Such as :“ 35234- 普及教育” Broad Match ‘2488-Education’
7th AOS Methods
subClassOf:BroadMatch (another main mapping relation 11408)
<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_35234_ 普及教育 _Universaleducation">
<rdfs:subClassOf rdf:resource="http://www.fao.org/aos/agrovoc/2005#c_2488_Education_ 教育 "/>
</rdf:Description>
7th AOS Methods
Narrow Match
CAT
Mapping
AGROVOC Narrow Match
Such as :“ 8341_ 岛屿 _Islands” Narrow Match “695_Atolls_ 环礁”
7th AOS Methods
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_695_Atolls_ 环礁 ">
<rdfs:subClassOf rdf:resource="http://www.caas.net.cn/2005/cat#c_8341_ 岛屿 _Islands"/>
</rdf:Description>
subClassOf: Narrow Match (173)
7th AOS Methods
AND ; OR ; NOT
AND
“59683- 自动标引” Exact Match ‘11729-Indexing of information’ AND ‘15855 -Automation’
OR NOT
“7536_ 大麦 _Barley” Exact Match ‘823_Barley_ 大麦 OR 3662_Hordeum vulgare_ 大麦植物’
‘12114- 非传染性病害’ Exact match ‘5962-Plant diseases’ NOT ‘34024-Infectious diseases’
7th AOS Methods
AND
“59683_ 自动标引 _Automaticindexing” Exact Match
11729_Indexingofinformation_ 信息编目 and 15855_Automation_ 自动化
7th AOS Methods
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_11729_Indexingofinformation_ 信息编目 "/>
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_15855_Automation_ 自动化 "/>
</owl:intersectionOf>
</owl:Class>
AND: intersectionOf
<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_59683_ 自动标引 _Automaticindexing">
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_11729_Indexingofinformation_ 信息编目 "/>
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_15855_Automation_ 自动化 "/>
</owl:intersectionOf>
</owl:Class>
</owl:equivalentClass>
</rdf:Description>
7th AOS Methods
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_823_Barley_ 大麦 "/>
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_3662_Hordeumvulgare_ 大麦植物 "/>
</owl:unionOf>
</owl:Class>
OR: unionOf
<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_7536_ 大麦 _Barley">
<owl:equivalentClass>
<owl:Class>
<owl:unionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_823_Barley_ 大麦 "/>
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_3662_Hordeumvulgare_ 大麦植物 "/>
</owl:unionOf>
</owl:Class>
</owl:equivalentClass>
</rdf:Description>
7th AOS Methods
NOT
‘12114_ 非传染性病害 _Non-infectiousdiseases’ Exact match
‘5962_Plantdiseases_ 植物病害 ’ AND NOT
‘34024_Infectiousdiseases_ 侵染性病害’
7th AOS Methods
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_5962_Plantdiseases_ 植物病害 "/>
<owl:Class>
<owl:complementOf rdf:resource="http://www.fao.org/aos/agrovoc/2005#c_34024_Infectiousdiseases_ 侵染性病害 "/>
</owl:Class>
</owl:intersectionOf>
</owl:Class>
NOT: complementOf
<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_12114_ 非传染性病害 _Non-infectiousdiseases">
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about="http://www.fao.org/aos/agrovoc/2005#c_5962_Plantdiseases_ 植物病害 "/>
<owl:Class>
<owl:complementOf rdf:resource="http://www.fao.org/aos/agrovoc/2005#c_34024_Infectiousdiseases_ 侵染性病害 "/>
</owl:Class>
</owl:intersectionOf>
</owl:Class>
</owl:equivalentClass>
</rdf:Description>
7th AOS Methods
<rdf:Description rdf:about="http://www.caas.net.cn/2005/cat#c_13867_ 干扰 _Interference">
<rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>AGROVOC hasn't this concept</rdfs:comment>
</rdf:Description>
NoMapping: comment
7th AOS Methods
How to get OWL documents
Convert the Excel sheet information to Protégé
(machine convert and human input ), and get OWL
mapping data;
Use the tools of ‘import ontology’, import one domain
of CAT and whole AGROVOC, and input the mapping
relations, after save the working, we can get different
domain OWL documents;
7th AOS Methods
Combine the OWL documents
Delete the top and the end of all OWL documents, then paste
them together,we get the whole middle part of mapping
project;
Create a new OWL document, import whole CAT and
AGROVOC, and save the document;
Insert the whole middle part of mapping project into the upper
document, then we get a whole mapping OWL document, it
works with whole CAT and AGROVOC.
Methods7th AOS
1 Candidate and the True mapping
Conclusions7th AOS
ClassificationExact match b n
e-b-ntotal
Otherrelation
Classificationtotal
Total 13 105 11 408 173 24 686 1 747 25 433
Num. Taxon.Geogr
. Total Action
Match English and Chinese 2 470 1 547 143 4 160 Exact match
Match English but different Chinese 624 546 15 1 187 Match not ensured
Match Chinese but different English 3 297 405 188 3 890 Tentative exact match
Automatic identification of candidate exact matches
The statistics of true mapping matches relation
2 The Series Mapping Knowledge Data Files
Conclusions7th AOS
The contribution include the following documents: (a) cat_agrovco_mapping.owl; (b) ag_20051101.owl; (c) cat_all_u.owl; (d) agrovoc-zh-revise.xls; (e) agrovoc-usefor-comment.xls;Users can use Protégé create a new ontology with the data of (a), the machine will ask to import (b) and (c), and then you can open the (a), the open time is a little slow, our computer need about 4 minutes, the computer CPU 3.4, RAM: 1 G. (d) notes the information which need to be revised about the terms of AGROVOC; (e) is the comments about AGROVOC terms
Discussions
No mapping ;
InexactMatch;
Begin from the top term;
Mapping document need work with CAT and AGROVOC;
There are many broadMatch relations;
The comment and the suggestion;
7th AOS Discussions
The Heredity of Mapping Relation
About 60% CAT
concepts obtain
mapping relation with
AGROVOC by
heredity. They
normally follow the
ExactMatch,
BroadMatch (24 513)
7th AOS Discussions
C1 A1
21 22
31 32 33
ExactMatch
BroadMatch
CAT AGROVOC
Different Thesauri with Different Classification
A few concepts
have different
domain trees in
two thesauri,
means different
thesauri have
their own
classification.
7th AOS Discussions
C1 A1
21 22
31 32 33ExactMatch
CAT AGROVOC
21 22
31 32
The Resource and Target
ExactMatch: same concepts;
BroadMatch: Chinese users get more broad concept, or get some useless information;English users get more specific concept, or can’t find all information.
NarrowMatch: the opposite.
CAT has more than 60,000 terms, AGROVOC has only about 30,000 terms, so take CAT as resource is better.
7th AOS Discussions
C1 A1
21 22
31 32 33
ExactMatch
BroadMatch
CAT AGROVOC
A4NarrowMatch
Discussions 2
Different knowledge taxonomy ;
Difference on noun and verb ; Different social ideas ; Different cultures ; Different translations.
7th AOS Discussions
Chinese Academy of Agricultural Sciences (CAAS)
andFood and Agriculture Organization (FAO)
[email protected] [email protected]
Thank you
7th AOS Thanks