An Intelligent Retrieval System for An Intelligent Retrieval System for Chinese Agricultural Scientific Chinese Agricultural Scientific
LiteratureLiterature
Ping Qian, Xiaolu SuPing Qian, Xiaolu SuScientech Documentation and Information CenterScientech Documentation and Information Center ,,
Chinese Academy of Agricultural Sciences, China.Chinese Academy of Agricultural Sciences, China.
{pingq, suxiaolu}@mail.caas.net.cn{pingq, suxiaolu}@mail.caas.net.cn
IntroductionIntroduction
• How to find out desired information from huge information resources faster and accurately, has become the serious harassment for people to develop and utilize the network information resources.
• This project attends to use new theory and technology to explore a solution to above problem.
• Currently, knowledge engineering concerning ontology under research is an important theoretical foundation and applied technology to solve knowledge discovery and acquisition.
Information Retrieval Information Retrieval Based on OntologyBased on Ontology
• Build up the domain ontology
• Create the database, referring to the ontology
• Conduct the retrieval with the help of ontology
• Process the results, then display the results
• Import the classification method based on ontology theory
• Create agricultural navigation information database
• Create index database ( Agricultural Scientific literature dat
abase )• Create Web information retr
ieval system• Display the results
Establish Process of Establish Process of the Systemthe System
Foundation of Building Agricultural SciFoundation of Building Agricultural Scientech Navigation Information Databaseentech Navigation Information Database
• Theory: Ontology
• Data Source: Agricultural Scientech Literature Database (more than 560,000 records)
• Tool: Statistical Analysis
• Standard: Chinese Library Classification Method
Stages of Building Agricultural Stages of Building Agricultural Navigation Information DatabaseNavigation Information Database
1.1. Agricultural Agricultural TheoreticalTheoretical Classification Tree Classification Tree
2.2. Agricultural Agricultural ActualActual Classification Tree Classification Tree
3.3. ClassClass -- Keyword Cross Table Keyword Cross Table
4.4. KeywordKeyword -- Class Cross Table Class Cross Table
5.5. Agricultural Navigation Information DatabasAgricultural Navigation Information Databasee
Agricultural Agricultural Theoretical Theoretical
Classification TreeClassification Tree– Component
• All of the Classes relevant to Chinese Library Classification Method
– Purpose
• Solve the problems in creating actual classification tree:
– The relation between class number and its name
– The gradation relation of some class numbers
– Data Amount
• Class and subclass: 42,948
• First Layer Class:17
序号 类号 类名 记录数
1 S 农业、农业科学 470,213
2 F 经济 47,503
3 T 工业技术 23,555
4 Q 生物科学 10,440
5 X 环境科学、劳动保护科学(安全科学) 6,252
6 P 天文学、地球科学 1,109
7 G 文化、科学、教育、体育 1,106
8 O 数理科学和化学 433
9 U 交通运输 398
10 R 医药、卫生 391
11 C 社会科学总论 209
12 D 政治、法律 102
13 Z 综合性图书 22
14 N 自然科学总论 21
15 K 历史、地理 19
16 H 语言、文字 5
17 V 航空、航天 2
First-Order Class Name in the First-Order Class Name in the TheoreticalTheoretical Tree Tree
Agricultural ActualActual
Classification Tree– Component :
• All of the classes indexed actually
– Purpose :• Founding the navigation information database
• Knowing the actual distribution of agricultural information to find new growing points of the development of agricultural sciences
– Data amount:
• Classes: 21,391 , Among them.
• Coordinated classes: 10,748
• Non-Coordinated classes: 10,643
Agricultural ActualActual Classification Tree Key PointKey Point ::
More than 100,000 class number and its corresponMore than 100,000 class number and its corresponding class nameding class name
Solution:Solution:Create Professional modeled class tables Create Professional modeled class tables (( 99 ))Create modeled class tables (6), among them:Create modeled class tables (6), among them:
General modeled class tables General modeled class tables (( 22 ))Professional modeled class tables Professional modeled class tables (( 44 ))
Modeled Class Table
表名 仿分范围 仿分范围名称 仿分类号 f401_406 F407.1/.9 各工业部门经济 F401/406
s220 S221/229 各种农机具 S220
s50 S51/59 各种农作物 S50
s60 S63/68 各种园艺 S60
S763_30 S763.31/.49 各种虫害及其防治 S763.30
s821 S822/829.9 各种家畜 S821
s831 S823/839 各种家禽 S831
s881_884_9
S885.1/.9 其他各种蚕类 S881/884. 9
s965 S943 各种鱼类的病害、敌害及其防治
S965
General Compound Class Table
表名 仿分范围名称 记录数 字段数fb2 世界地区复分表 F401/406 5
fb3 中国地区复分表 S220 4
Professional Compound Class Table
表名 复分范围 复分范围类名 记录数 F33_37 F33/37 各国农业经济 21
F43_47 F43/47 各国工业经济 19
S727_728 S727/728 各林种、各类特殊地区的造林 5
S79 S791/796 各种森林树种 8
Examples of Modeled Class Table
Examples of General Modeled Class Table
Examples of Professional Modeled Class Table
Class - Keyword Cross Table (17,582)
Keyword - Class Cross TableBeforeBefore delete replication delete replication about 1,210,000 wordsabout 1,210,000 words
After delete replication After delete replication
About 320,000 wordsAbout 320,000 words
Agricultural Navigation Information Database
• Determine the regulations for organizing the information
• Make XML files for navigation information• Choose the database management system• Define database structure
The Regulations for Organizing the Information
• Never lose any class or sub-class having record • Display order: Class having more records listed first,
then listed from higher class layer to lower• If one node does not have record as well as one sub-
node only, this node is deleted and move its sub-node to upper layer
• Sub-class below the third layer class merge up to the third class
• Less than 30 records in the subclass are ignored temporarily
XML files for Navigation Information(33MB)
Data Check and Display Menu
Database Management System
• Relational Database– XML - Enabled Database
• Need transfer, low efficiency
• Native XML Database – Software AG Tamino
• Read XML data directly• Save data in XML format
Define Database Structure
System FrameworkXMLDBMS/RDBMS+XML+JAVA/JSP Browser / Server 3 Layer system structure
Environment for running JSP and XML
Java SDK 1.3.1 Xalan2.2.0
Tomcat3.2
Demo of The Retrieval System
Registration
Login
Browse Retrieval
Enter Keyword
Display the Results
Second-Order Retrieval
Retrieval from the Tree Directly
Retrieval from the Tree Directly
Intelligent Retrieval
Fined Retrieval
Fined Retrieval
Conclusion• The establish of the agricultural scientific navigation informati
on database and the development of its web search system change the traditional retrieval method from based on keyword to based on knowledge organization structure.
• It is also a foundation work. The actual classification table and the cross tables between class and keyword established in the project are valuable Chinese agricultural semantic resources.
• It is useful for the further studies on the automatic distinguish and classification of agricultural information as well as constructing strict agriculture domain ontology.
• The work is just the beginning of the study on ontology and its application in agriculture.
The EndThe End
Thanks for All