Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

Post on 12-Jan-2016

50 views 0 download

description

Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection. Presenter : Cheng-Feng Weng Authors : Tommy W. S. Chow, M. K. M. Rahman 2009/10/12. TNN.18 (2009). Outline. Motivation Objective Method Experiments Conclusion Comments. Motivation. - PowerPoint PPT Presentation

transcript

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

Presenter : Cheng-Feng Weng

Authors :Tommy W. S. Chow, M. K. M. Rahman

2009/10/12

TNN.18 (2009)

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

2

Outline

Motivation Objective Method Experiments Conclusion Comments

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

3

Motivation

Document Retrieval: Term-Frequency Problem

Two doc. Containing similar term frequencies may be of different contextually when it spatial distribution of terms is very different.

Plagiarism Detective: Paraphrasing Problem

SOM…project……..

SOM…be mapped into……..

Science…….Computer…….School……..

School of Computer Science……..

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

4

Objective It proposed a tree-structured

document model with MLSOM for DR and PD.

Document…….

DR

PD

Global View

Local View

Tree-Structured Model

MLSOM

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

5

Structured Representation of DF

A document is partitioned into pages that are further partitioned into paragraphs.

我是網頁 第一行 第二行 無言的第三行

<HTML> <HEAD></HEAD> <BODY> 我是網頁 <br> <p> 第一行 </p> <p> 第二行 </p> 無言的第三行 </BODY></HTML>

我是網頁

第一行

第二行

無言的第三行

Page

我是網頁

第一行

Paragraph

我是網頁

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

6

Structured Representation of DF (cont.)

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

7

Multilayer SOM

MLSOM was developed for handling tree-structured data.

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

8

Multilayer SOM (cont.)

Similarity:

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

9

MLSOM Retrieval

Document

Trained MLSOM

Extract to tree-structure and project with PCA matrix

Related Docs.

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

10

Plagiarism Detective

Plagiarism Detective using Local Association (PDLA)

Layer 3 SOM

D1, D2, …

D3, D4, ….

D2, D6, …

Related Docs.

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

11

Experiments

Document Retrieval:

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

12

Experiments (cont.)

Plagiarism Detective:

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

13

Conclusions

A new approach of DR and PD using tree-structured document representation and MLSOM is proposed. It has shown that tree-structured representation

enhances the retrieval accuracy by incorporating local characteristics with traditional global characteristics.

Computational Issue: The MLSOM serves as an efficient computational

solution for practical implementation.

N.Y.U.S.T.

I. M.

Intelligent Database Systems Lab

14

Comments

Advantage Practical, Simple but efficient and effective

Drawback Rate of fail plagiarism detective is still high

Application …