+ All Categories
Home > Documents > Annotating Search Results from Web Databases

Annotating Search Results from Web Databases

Date post: 06-Nov-2015
Category:
Upload: swamishailu
View: 6 times
Download: 0 times
Share this document with a friend
Description:
An increasing number of databases have become web accessible through HTML form-based search interfaces. The dataunits returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For theencoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internetcomparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automaticannotation approach that first aligns the data units on a result page into different groups such that the data in the same group have thesame semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a finalannotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new resultpages from the same web database. Our experiments indicate that the proposed approach is highly effective.
Popular Tags:
24
CONTENT Introduction Existing System Proposed System Phases of system System Architecture System workflow Modules Advantages of Proposed System Algorithm used in system User classes Activity diagram Applications Software & Hardware requirement References
Transcript

PowerPoint Presentation

CONTENTIntroductionExisting SystemProposed SystemPhases of systemSystem ArchitectureSystem workflowModulesAdvantages of Proposed SystemAlgorithm used in systemUser classesActivity diagramApplicationsSoftware & Hardware requirementReferences

IntroductionNumbers of databases available from html forms might be encoded using different formatting in html tags.

Data unit level annotation.

Automatically assign labels to the data units of SRRs returned from WDBs.

Deep Web Data Collection Application or Internet Comparison Shopping.

EXISTING SYSTEMIn existing system data unit is a piece of text that semantically represent one concept of an entity.It describe relation between text node and data unit.Early applications require tremendous human efforts to annotate data units manually, which severely limit their scalability.There is high demand for collecting data of interest from multiple WDBs. In this proposed system we consider how to automatically assign labels to the data units within the SRRs returned from WDBs. PROPOSED SYSTEMOUR APPROCH

Align data units on as result page into different groups such that data units in same group having same semantic.

For each group annotate with different aspects of annotation.

We consider how to automatically assign labels to the data units within the SRRs returned from WDBs.

PHASES OF SYSTEMOur solution consists of three phases.

a) Alignment phase.b)Annotation phase.c)Annotation wrapper generation phase.A) ALIGNMENT PHASE

Identify all data units in SRRs.

Organize them into different groups.each group corresponding to a different concepts.

B) ANNOTATION PHASE

Introduce multiple basic annotators.

Each exploiting one type of features.

C) ANNOTATION WRAPPER GENRATION PHASE

Generate the annotation rules .

Each rule describes how to extract the data units of concepts which are given in annotation phase in the result page.

It also describe what the appropriate semantic label should be.

Data Unit & Text Nodes Features(Content, presentation style, data-type, path, adjacency)Data Unit SimilarityAlignment AlgorithmLocal Schema & Integrated Interface SchemaTable Annotator, Query Based Annotator, Schema Value Annotator, Frequency based Annotator, In text prefix/ suffix annotator, Common Knowledge AnnotatorCombining Annotators -> Build WrapperData alignmentAssigning labelsSYSTEM ARCHITECTURESYSTEM WORKFLOW

MODULESData Unit and Tag Node Extraction:

Identify relationship between text nodes & tag nodes

Data Unit and Text Node Features

Data Alignment Algorithm

Label AssignmentOne-to-One Relationship.One-to-Many Relationship. Many-to-One Relationship.One-To-Nothing Relationship.

Data Unit and Text NodeData Content (DC)Presentation Style (PS)Data Type (DT)Tag Path (TP)Adjacency (AD)Data Unit and Text Node FeaturesData Unit Similarity.Data content similarity .Presentation style similarity .Presentation style similarity .Data type similarity .

DATA ALIGNMENT Our data alignment method consists of the following four steps.

Merge text nodes.Align text nodes.Split (composite) text nodes.Align data units.Alignment AlgorithmApply semantics labels for each data units which got from SRRs.

ASSIGNING LABELSADVANTAGES OF PROPOSED SYSTEMWe use data unit level annotation.

We propose a clustering-based shifting technique .(data units inside the same group have the same semantic)

To construct an annotation wrapper for any given WDB. The wrapper can be applied to efficiently annotating the SRRs retrieved from the same WDB with new queries.

USER CLASSESThe various classes used in the Interpretation search result from web database are:Wrapper-An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database.Search engine-It reads the data from the web database and provides to Data for comparison shopping.Wrapper builder-Combining annotator for producing a result.ACTIVITY DIAGRAM

APPLICATIONSWeb data collection.

Internet comparison shopping.

SOFTWARE REQUIREMENTSOperating system-Windows XP, 7Coding language -JAVADevelopment kit-JDK 1.6 & aboveFront End-JAVA Swing

HARDWARE REQUIREMENTSProcessor-Pentium IVSpeed -1.1 GhzRAM-256 MB(min)Hard Disk-20 GBMotherboard -Intel 945 GLXREFERENCE1] A. Arasu and H. Garcia-Molina, Extracting Structured Data from Web Pages, Proc. SIGMOD Intl Conf. Management of Data, 2003.2] L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo, Automatic Annotation of Data Extracted from Large Web Sites, Proc. Sixth Intl Workshop the Web and Databases (WebDB), 2003.3] P. Chan and S. Stolfo, Experiments on Multistrategy Learning by Meta-Learning, Proc. Second Intl Conf. Information and Knowledge Management (CIKM), 1993.4] W. Bruce Croft, Combining Approaches for Information Retrieval, Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, Kluwer Academic, 2000.5] V. Crescenzi, G. Mecca, and P. Merialdo, RoadRUNNER: Towards Automatic Data Extraction from Large Web Sites, Proc. Very Large Data Bases (VLDB) Conf., 2001.

THANK YOU !!!!


Recommended