+ All Categories
Home > Documents > Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which...

Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which...

Date post: 22-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
28
Understanding the Hidden Web Pierre Senellart Introduction Process description Discovery Wrappers Semantic Analysis Indexing and Querying Summary Understanding the Hidden Web Pierre Senellart Journées GEMO — 2nd June 2005
Transcript
Page 1: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Understanding the Hidden Web

Pierre Senellart

Journées GEMO — 2nd June 2005

Page 2: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

The Hidden Web

Definition (Hidden Web)The set of webpages (which may or may not be dynamicallygenerated) not accessible from the hyperlinked structure ofthe World Wide Web.

Size estimate (2001) : 500 times larger than the surfaceWeb.

How to understand it and benefit from its content?

Page 3: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

The Hidden Web

Definition (Hidden Web)The set of webpages (which may or may not be dynamicallygenerated) not accessible from the hyperlinked structure ofthe World Wide Web.

Size estimate (2001) : 500 times larger than the surfaceWeb.

How to understand it and benefit from its content?

Page 4: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

The Hidden Web

Definition (Hidden Web)The set of webpages (which may or may not be dynamicallygenerated) not accessible from the hyperlinked structure ofthe World Wide Web.

Size estimate (2001) : 500 times larger than the surfaceWeb.

How to understand it and benefit from its content?

Page 5: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Semantic Interpretation Process

World Wide Web

Page 6: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Semantic Interpretation Process

World Wide WebHTML form

WSDL

UDDIdiscovery

World Wide Web

Page 7: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Semantic Interpretation Process

World Wide WebHTML form

WSDL

UDDIdiscovery

World Wide Web

WSDL

wrappers

HTML form

WSDL

UDDIdiscovery

World Wide Web

Page 8: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Semantic Interpretation Process

World Wide WebHTML form

WSDL

UDDIdiscovery

World Wide Web

WSDL

wrappers

HTML form

WSDL

UDDIdiscovery

World Wide Web

analysisAnalyzed Web Services WSDL

wrappers

HTML form

WSDL

UDDIdiscoveryWorld Wide Web

Page 9: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Semantic Interpretation Process

World Wide WebHTML form

WSDL

UDDIdiscovery

World Wide Web

WSDL

wrappers

HTML form

WSDL

UDDIdiscovery

World Wide Web

analysisAnalyzed Web Services WSDL

wrappers

HTML form

WSDL

UDDIdiscoveryWorld Wide Web

Web Services Index

indexing

Web Services Index

indexing

analysisAnalyzed Web Services WSDL

wrappers

HTML form

WSDL

UDDIdiscoveryWorld Wide Web

Page 10: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Semantic Interpretation Process

World Wide WebHTML form

WSDL

UDDIdiscovery

World Wide Web

WSDL

wrappers

HTML form

WSDL

UDDIdiscovery

World Wide Web

analysisAnalyzed Web Services WSDL

wrappers

HTML form

WSDL

UDDIdiscoveryWorld Wide Web

Web Services Index

indexing

Web Services Index

indexing

analysisAnalyzed Web Services WSDL

wrappers

HTML form

WSDL

UDDIdiscoveryWorld Wide Web

Results

Queries

queryingWeb Services Index

indexing

Web Services Index

indexing

analysisAnalyzed Web Services WSDL

wrappers

HTML form

WSDL

UDDIdiscoveryWorld Wide Web

Page 11: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Discovery

Crawling the World Wide Web for:

HTML forms implementing a Web ServiceUDDI registriesWSDL descriptionsOther resources (XML, HTML, Web as a full-textindex. . . )

Only interested in Web Services with no side effects:

OkYellow PagesPublication databases. . .

Not OkBooking servicesMailing list management. . .

Page 12: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Discovery

Crawling the World Wide Web for:

HTML forms implementing a Web ServiceUDDI registriesWSDL descriptionsOther resources (XML, HTML, Web as a full-textindex. . . )

Only interested in Web Services with no side effects:

OkYellow PagesPublication databases. . .

Not OkBooking servicesMailing list management. . .

Page 13: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Discovery

Crawling the World Wide Web for:

HTML forms implementing a Web ServiceUDDI registriesWSDL descriptionsOther resources (XML, HTML, Web as a full-textindex. . . )

Only interested in Web Services with no side effects:

OkYellow PagesPublication databases. . .

Not OkBooking servicesMailing list management. . .

Page 14: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Discovery

Crawling the World Wide Web for:

HTML forms implementing a Web ServiceUDDI registriesWSDL descriptionsOther resources (XML, HTML, Web as a full-textindex. . . )

Only interested in Web Services with no side effects:

OkYellow PagesPublication databases. . .

Not OkBooking servicesMailing list management. . .

Page 15: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Wrapping Web Service Descriptions

Analyzing the structure of:HTML forms

Result webpages

Page 16: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Conceptual Model

IsA ontology of concepts (simple DAG)

Person

Man Woman

Thing

Proceedings Article Book

Publication

n-ary typed roles

AuthorOf(Person,Publication)HasName(Person,Name)

Page 17: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Conceptual Model

IsA ontology of concepts (simple DAG)

Person

Man Woman

Thing

Proceedings Article Book

Publication

n-ary typed roles

AuthorOf(Person,Publication)HasName(Person,Name)

Page 18: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Services and queries

ExampleService giving authors from publication titles

A*← WrittenBy(P,A),HasTitle(P,T),Input(T)

QueryService with no input

Example<A,T*>*← WrittenBy(P,A), Article(P), HasTitle(P,T),KeywordOf(“xml”,P)

Page 19: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Services and queries

ExampleService giving authors from publication titles

A*← WrittenBy(P,A),HasTitle(P,T),Input(T)

QueryService with no input

Example<A,T*>*← WrittenBy(P,A), Article(P), HasTitle(P,T),KeywordOf(“xml”,P)

Page 20: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Semantic Interpretation of a Service

How to analyze a Web Service into this formalism?

Field labels and variable namesExample requestsConcrete type descriptionsLinguistic analysis of plain text descriptions

Page 21: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Semantic Interpretation of a Service

How to analyze a Web Service into this formalism?

Field labels and variable namesExample requestsConcrete type descriptionsLinguistic analysis of plain text descriptions

Page 22: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Semantic Interpretation of a Service

How to analyze a Web Service into this formalism?

Field labels and variable namesExample requestsConcrete type descriptionsLinguistic analysis of plain text descriptions

Page 23: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Semantic Interpretation of a Service

How to analyze a Web Service into this formalism?

Field labels and variable namesExample requestsConcrete type descriptionsLinguistic analysis of plain text descriptions

Page 24: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Indexing and Querying

Given a query, represented as an Analyzed Web Service,how to know which known web services to query?

Issues:

Subsumption of input/output parametersMissing input parametersComposition of webservices

Page 25: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Indexing and Querying

Given a query, represented as an Analyzed Web Service,how to know which known web services to query?

Issues:

Subsumption of input/output parametersMissing input parametersComposition of webservices

Page 26: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Indexing and Querying

Given a query, represented as an Analyzed Web Service,how to know which known web services to query?

Issues:

Subsumption of input/output parametersMissing input parametersComposition of webservices

Page 27: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Indexing and Querying

Given a query, represented as an Analyzed Web Service,how to know which known web services to query?

Issues:

Subsumption of input/output parametersMissing input parametersComposition of webservices

Page 28: Understanding the Hidden Web · The Hidden Web Definition (Hidden Web) The set of webpages (which may or may not be dynamically generated) not accessible from thehyperlinked structureof

Understandingthe Hidden

Web

PierreSenellart

Introduction

ProcessdescriptionDiscovery

Wrappers

Semantic Analysis

Indexing andQuerying

Summary

Web Service Semantic Interpretation Process

World Wide WebHTML form

WSDL

UDDIdiscovery

World Wide Web

WSDL

wrappers

HTML form

WSDL

UDDIdiscovery

World Wide Web

analysisAnalyzed Web Services WSDL

wrappers

HTML form

WSDL

UDDIdiscoveryWorld Wide Web

Web Services Index

indexing

Web Services Index

indexing

analysisAnalyzed Web Services WSDL

wrappers

HTML form

WSDL

UDDIdiscoveryWorld Wide Web

Results

Queries

queryingWeb Services Index

indexing

Web Services Index

indexing

analysisAnalyzed Web Services WSDL

wrappers

HTML form

WSDL

UDDIdiscoveryWorld Wide Web


Recommended