11 An Improved Discovery Engine for Efficient and Intelligent discovery of Web Service with...

transcript

An Improved Discovery Engine for Efficient and Intelligent discovery of Web

Service with publication facility

Vandan Tewari1 . N Dagdee2 . Inderjeet Singh1 . Nipur Garg1 . Preeti Soni1

1. Shri G.S. Institute of Technology & Science, Indore 2. Shri S.D. Bansal Institute of Tech. & Science, Indore

Contents

BackgroundRelated work on Web Service DiscoveryAddressed IssuesOur proposal

Proposed architecture Proposed Algorithm Modules implemented

Test Case and ResultsConclusion & Future Enhancements

Background

SOA (Service Oriented Architecture)

Service Oriented architecture is the latest evolution of distributed computing which enables software components to be exposed as services.

Web Service A web service is a stand alone software component

designed to support interoperable machine-to-machine

interaction over a network.

Web Service & SOA

An example scenario of web service

Find Publish

Related work on Web Service Discovery Available sources for service discovery & their respective drawbacks.A. Centralized Service Broker (UBRs) *

Single point failure.Performance Bottlenecks

B. Federated Registries*Inconsistent policies to be employed so real time search is inefficient.No advance search facility is available.

* Ref.[E.Al-Masri, www2008,pp.795-804]

Continued…

C. Search Engine **Inability to distinguish between web page & web service document (WSDL) leads to data irrelevancy.

D. Web Crawler Engine***Problem of service overload still exists.

** Ref. [K.Sivashanmugam et.al. ISWC,pp. 270-278,2004]

***[E. Al-Masri, Q. H. Mahmoud, IEEE ICWS 2007, pp.1104-1111]

Technical limitations of UDDI

Passivity of UDDI since service revocation is voluntary, it results in passive data in UDDI.

Absence of QoS parameters for Web Services.

Absence of web service life cycle management.

Ref. [K.Sivashanmugam et.al. IEEE,ISWC ,2004,pp. 270-278]

Addressed Issues

How to deal with passivity of UBRs to increase service availability.

Due to service overload if UBRs are overflowing with services, difficulty in discovering appropriate services.

Suggesting appropriate services to the service requester based on service feedbacks and frequency of usage.

Our Proposal

A “Discovery cum Publishing Engine” has been designed which increases the service availability by removing passive web services from UBR and improve service search time by applying data mining techniques on the contents of UBR and also uses past user service feedbacks and usage frequency to suggest appropriate services to the service consumer .

Assumptions

• Domain of trust among UBRs is already established.

Test case is developed on small set of experimental data.

Predefined classification scheme is used based on “Location parameter” of Travel service.

Discovery cum Publishing Engine

Validation Module

Publish Manager

Search Manager

Add Review

Service consumer Service Provider

UBR1 UBR2 UBR3 UBRn

Discover Publish

Proposed Architecture of Our System

Crawl Crawl Crawl

Modules Implemented

A.Publish Manager

B. Search ManagerUBRs Crawl Module Search Module Dynamic IP Module Cluster Module

C. Validate Module WSDL Parser Module Delete Module

D. Add Review Module

Working of Proposed System

Mechanism of Dynamic IP Module

Update the IP Table of Engine dynamically :

Step 1 : Starts crawling on initial seeds.

Step 2 : From each initial seed it finds out the IP addresses of the service providers.

Step 3 : From each provider it fetches the IP Addresses of UBRs in which they have published their other web services .

Step 4 : Those fetched IP Addresses will be compared with initial seeds, if any new IP is identified it will be stored in its local IP Table ; rest will be overlooked in order to avoid redundancy.

Proposed Algorithm for Publishing

Publish Manager

Step 1: Start

Step 2: Select UBRs IP Address where publishing is required.

Step 3: Accept details of web services along with its location that acts as a predefined class from the service provider

Step 4: Classify the web service based on its location which acts as a class.

Step 5: Store the details of web service information into selected UBR in a particular class to which that service belongs.

Step 6: Stop

Classification Scheme followed

Location /Class No. of Published WS

Location/Class No. of Published WS

AnnapurnaRoad 2 Pardesi Pura 2

HukumchandMarg 2 RajendraNagar 2

Indore Ho 2 Rajwada 2

Khajrana 3 Shivji Nagar 1

Khatipura 2 Southtokoganj 3

LaxmibaiNagar 1 VallabhNagar 3

Malwa Mill 1 Vijay Nagar 3

M G Road 4 Y N Road 2

Navlakha 2 YashwantNRoad 2

Old Palasia 3 Total 42

Just an example scenario

Table 1.1 List of Class along with number of published web services.

Proposed Algorithm for Searching

Step 1: Start Step 2: Enter keyword for which services to be searched (for ex. Travel i.e. choosing the super class.)Step 3: Select Location of service (Let it be denoted by a class and serve as centroid for selection of cluster).Step 4: Initialize IPTable for initial seeds of UBR. Do Step 4a. Call Dynamic IP Module. Step 4b: Call Cluster module to create cluster based on location attribute. Step 4c: If (Location is not chosen) Treat all classes in a single cluster.

goto step: 4d Else

If (Maximal distance <=min threshold) Put the location class in same cluster.

Select the cluster in which centroid belongs. Step 4d: For each location class in selected cluster, fetch all services belonging to each

of the class along with their frequency of usage data. Step 4e: Call the cluster module to create cluster based on service usage frequency.

Continued… Step 4f: Parse WSDL document against access point URL for each discovered web

services i.e. validate web service.

Step 4g: If Web service is Active Store it locally Else Fetch service Key against that access point URL from UBRs and pass it to delete module that store it locally for future use and delete the web service from respective UBR.

Until all IP Seeds are visited from UBR crawl queue .

Step 5: Add Service Reviews to each service of active service list which has been stored locally from virtual UBR on which engine resides.Step 6: Display the list of web service to the end user.Step 7: If User binds the service Ask the user to write a feedback of the used service. Accept details of user along with comment and rating to the service and store these details to extended service registry structure.Step 8 :end

Agglomerative Algorithm for Complete-Link Clustering

It looks for cliques.

Find the maximal distance between any clusters so that two clusters are merged if the maximum distance is less than or equal to the distance threshold.

Euclidean distance Between points p and q can be calculated as

Adjacency Matrix for Maximal distance (Based on the location attribute)

AR HM IH KH KP LN MM MG NK OP PP RN RW SN ST VN VJ YN YW

AR 0 2.5 4 10 9 7 6.5 6.8 4.5 7.2 7.8 0.5 6.7 7.2 5.2 6 11 7.5 7

HM 2.5 0 3.2 9.2 6.5 2 5 3.5 4.8 6.2 5.3 3 3 4 6 4.5 9 6.8 7.2

IH 4 3.2 0 6.2 5.8 2.3 3 1.8 2.5 2.8 4 5.5 2.9 2 1.5 1.8 6.5 3.2 2.9

KH 10 9.2 6.2 0 7 4 2.9 3.5 8.8 4.5 2 9.5 3.9 3.3 5.5 4.2 2 4.5 6.5

KP 9 6.5 5.8 7 0 4.5 3 3.2 7.5 5.1 2.1 11 3.5 3.8 5.5 4.9 2.8 3.1 6.9

LN 7 2 2.3 4 4.5 0 3.9 2.8 4.8 4.5 2.5 7.8 1.5 3.2 5 3.8 5.5 4 5.8

MM 6.5 5 3 2.9 3 3.9 0 1 5.2 2 0.8 7.8 2 0.8 3.2 1.2 3 0.5 4.5

MG 6.8 3.5 1.8 3.5 3.2 2.8 1 0 4 2 2.3 7.5 1.5 0.8 2.9 1 5.5 1.8 4.5

NK 4.5 4.8 2.5 8.8 7.5 4.8 5.2 4 0 4.5 6 6 5 4.5 2 3 7 5.5 1.5

OP 7.2 6.2 2.8 4.5 5.1 4.5 2 2 4.5 0 3.8 8.5 3.5 2.2 2 1 4.8 2.5 3.2

PP 7.8 5.3 4 2 2.1 2.5 0.8 2.3 6 3.8 0 9 2.5 1.5 4.8 2.8 3 2.5 6

RN 0.5 3 5.5 9.5 11 7.8 7.8 7.5 6 8.5 9 0 7.2 8.3 6.2 7 12 8.5 8

RW 6.7 3 2.9 3.9 3.5 1.5 2 1.5 5 3.5 2.5 7.2 0 2.1 3.8 3 4 2.5 5.2

SN 7.2 4 2 3.3 3.8 3.2 0.8 0.8 4.5 2.2 1.5 8.3 2.1 0 4 1 4.2 1.5 5

ST 5.2 6 1.5 5.5 5.5 5 3.2 2.9 2 2 4.8 6.2 3.8 4 0 1.8 6 3.5 1.2

VN 6 4.5 1.8 4.2 4.9 3.8 1.2 1 3 1 2.8 7 3 1 1.8 0 4.7 0.8 2.5

VJ 11 9 6.5 2 2.8 5.5 3 5.5 7 4.8 3 12 4 4.2 6 4.7 0 5 5.8

YN 7.5 6.8 3.2 4.5 3.1 4 0.5 1.8 5.5 2.5 2.5 8.5 2.5 1.5 3.5 0.8 5 0 3

YW 7 7.2 2.9 6.5 6.9 5.8 4.5 4.5 1.5 3.2 6 8 5.2 5 1.2 2.5 5.8 3 0

Mechanism for service rating

Extended service registry design is proposed. The schema design of this template table is as follows.

The data regarding the frequency of invocation of services is also kept in the virtual root registry in the proposed architecture and is to be published by the service provider periodically.Calculate average rating for a service considering an equal share of user reviews as well as frequency of usage of service.

Sname Person Name

e-id Review Rating

How service usage frequency is used for clustering the services: An Example

Ts1 Ts2 Ts3 Ts4 Ts5

User A 5 2 3 0 1

User B 1 3 2 0 2

User C 6 1 5 1 1

User D 8 2 4 0 2

User E 5 3 2 1 0

25 11 16 2 6

Ts1 Ts2 Ts3 Ts4 Ts5

Ts1 0 8.83 5.56 11.4 10.1

Ts2 8.83 0 4.79 4.58 3.3

Ts3 5.56 4.79 0 6.78 5.29

Ts4 11.4 4.58 6.78 0 3.16

Ts5 10.1 3.3 5.29 3.16 0

Choosing threshold t = 6

Clusters formed are ( Ts1, Ts2, Ts3) ,Ts4, Ts5

User will be presented with first cluster since average invocation frequency is highest for this cluster

Here Ts1, Ts2, Ts3, Ts4,Ts5 are representing the travel services used by various users.

Test case and Results

If user searches for Travel web services at a location like Annapurna Road

Continued…

List of search result for Annapurna Road

If user not choose any location then list of search results are

If User wants to execute the service he will have to click on service name

Continued…

Here user can rate as well as review the service they used

Continued…

The proposed Engine can…

Reduce population of passive web services from UBR using validation mechanism.

Crawl over an IP list which can grow dynamically.

Narrow down the search space of UBR.

Suggest the services to the user based on user feedbacks and service usage frequency

Provides web service publication facility to user.

Conclusion

A Discovery cum Publishing Engine for searching web services has been proposed which uses service ranking techniques for efficient and effective web service discovery. We have used Data Mining Techniques to narrow down the search space in UBRs. In addition ,an extended design of service registry has been proposed which stores service feedback and service usage frequency along with the service information, which has been used to rank services within a selected cluster.

This work may further be generalized if instead of taking a services attribute ,we consider non functional parameters or semantics of services for applying the data mining techniques.

References

• E. Al-Masri, and Q. H. Mahmoud, “WSCE: A crawler engine for large-scale discovery of web services”, In Proceedings of IEEE ICWS pp.1104-1111, 2007.

• K.Sivashanmugam,K.Verma and A Seth.Discovery of web services in a federated environment. In proceedings of ISWC,pp270-278,2004.

• Yan Li , Yao Liu, Liangjie Zhang, Ge Li, Bing Xie, Jiasu Sun , An Exploratory study of Web Services on the internet,. In ICWS 2007(IEEE).

• E.Al-Masri and Q.H. Mahmoud, Crawling Multiple UDDI Business Registries, Proc. 16th Int’l World Wide Web Conf., ACM.

• E.Al-Masri,Q.H.Mahmoud, Discovering Web Services in Search Engine, WWW 2007, May8-12, 2007,Banff,Alberta,Canada.

• “Data Mining Introductory and Advanced Topics” by Margaret H. Dunham & S. Sridhar.

11 An Improved Discovery Engine for Efficient and Intelligent discovery of Web Service with...

Documents