Semantics Based Automated Service Discovery

7/29/2019 Semantics Based Automated Service Discovery

1/16

Semantics-Based AutomatedService Discovery

Aabhas V. Paliwal, Student Member, IEEE, Basit Shafiq, Member, IEEE,

Jaideep Vaidya, Member, IEEE, Hui Xiong, Senior Member, IEEE, andNabil Adam, Senior Member, IEEE

AbstractA vast majority of web services exist without explicit associated semantic descriptions. As a result many services that are

relevant to a specific user service request may not be considered during service discovery. In this paper, we address the issue of web

service discovery given nonexplicit service description semantics that match a specific service request. Our approach to semantic-

based web service discovery involves semantic-based service categorization and semantic enhancement of the service request. We

propose a solution for achieving functional level service categorization based on an ontology framework. Additionally, we utilize

clustering for accurately classifying the web services based on service functionality. The semantic-based categorization is performed

offline at the universal description discovery and integration (UDDI). The semantic enhancement of the service request achieves a

better matching with relevant services. The service request enhancement involves expansion of additional terms (retrieved from

ontology) that are deemed relevant for the requested functionality. An efficient matching of the enhanced service request with the

retrieved service descriptions is achieved utilizing Latent Semantic Indexing (LSI). Our experimental results validate the effectiveness

and feasibility of the proposed approach.

Index TermsWeb services publishing, web services discovery, services discovery process and methodology.

1 INTRODUCTION

A large number of web services structure a service-oriented architecture and facilitate the creation ofdistributed applications over the web. These web servicesoffer various functionalities in the areas of communications,data enhancement e-commerce, marketing, utilities among

others. Some of the web services are published and invokedin-house by various organizations. These web services maybe used for business applications, or in government andmilitary. However, this requires careful selection andcomposition of appropriate web services. The web serviceswithin the service registry (UDDI) [16] have predefinedcategories that are specified by the service providers. As aresult, similar services may be listed under differentcategories. Given the large number of web services and thedistribution of similar services in multiple categories in theexisting UDDI infrastructure, it is difficult to find servicesthat satisfy the desired functionality. Such service discoverymay involve searching a large number of categories to findappropriate services. Therefore, there is a need to categorizeweb services based on their functional semantics rather thanbased on the classifications of service providers.

Semantic categorization of web services will facilitate

service discovery by organizing similar services together.

However, this is not sufficient to improve the selection andmatching process. Most service descriptions that exist todate are syntactic in nature. Existing service discoveryapproaches often adopt keyword-matching technologies tolocate the published web services. This syntax-based

matchmaking returns discovery results that may notaccurately match the given service request. As a result,only a few services that are an exact syntactical match of theservice request may be considered for selection. Thus, thediscovery process is also constrained by its dependence onhuman intervention for choosing the appropriate servicebased on its semantics.

Semantic web technology is a promising approach forautomated service discovery and selection [23]. A majorityof the current approaches for web service discovery call forsemantic web services that have semantic tagged descrip-tions through various approaches, e.g., OWL-S, Web

Services Description Language (WSDL)-S [24], [22]. How-ever, these approaches have several limitations. First, it isimpractical to expect all new services to have semantictagged descriptions. Second, descriptions of the vastmajority of already existing web services are specifiedusing WSDL and do not have associated semantics. Also,from the service requestors perspective, the requestor maynot be aware of all the knowledge that constitutes thedomain. Specifically, the service requestor may not beaware of all the terms related to the service request. As aresult of which many services relevant to the request maynot be considered in the service discovery process.

In order to address the limitations of existing ap-proaches, an integrated approach needs to be developedfor addressing the two major issues related to automatedservice discovery: 1) semantic-based categorization of web

260 IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2012

. A.V. Paliwal is with CIMIC, Rutgers University, 1110 Stony Brook Way,North Brunswick, NJ 08902. E-mail: [email protected].

. B. Shafiq, J. Vaidya, H. Xiong, and N. Adam are with the MSISDepartment and CIMIC, Rutgers University, 1 Washington Park, Newark,NJ 07102. E-mail: [email protected], [email protected],[email protected], [email protected].

Manuscript received 26 July 2008; revised 4 Nov. 2008; accepted 11 Feb. 2010;published online 3 Aug. 2011.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TSC-2008-07-0068.Digital Object Identifier no. 10.1109/TSC.2011.19.

1939-1374/12/$31.00 2012 IEEE Published by the IEEE Computer Society


2/16

services; and 2) selection of services based on semanticservice description rather than syntactic keyword matching.Moreover, the approach needs to be generic and should notbe tied to a specific description language. Thus, any givenweb service could be described using WSDL, OWL-S, orthrough other means.

Furthermore, the approach should make no assumptionsabout the kinds of web services. In specific, we do not makeany assumption about whether the web services aredeveloped in-house or offered to users by third partyservice providers.

In this paper, we present a novel approach for semantic-based automated service discovery. Specifically, the pro-posed approach focuses on semantic-based service categor-ization and selection as depicted in Fig. 1. In our proposedapproach, semantic-based categorization of web services isperformed at the UDDI that involves semantics augmentedclassification of web services into functional categories. The

semantically related web services are grouped togethereven though they may be published under differentcategories within the UDDI. Service selection then consistsof two key steps: 1) parameters-based service refinement;and 2) semantic similarity-based matching. The web serviceinput and output parameters contain the underlyingfunctional knowledge that is extracted for improvingservice discovery. Parameter-based service refinementexploits a combination of service descriptions and inputand output to narrow the set of appropriate servicesmatching the service request, by combining semantics withsyntactic characteristic of a WSDL document. The refined

set of web services is then matched against an enhancedservice request as part of Semantic Similarity-based Match-ing. The service request is enhanced by adding relevantontology concepts, which improves the matching of theservice request with the web services. We now present abrief running example that is used throughout the paper tobetter explain the proposed approach.

Example 1. Consider a user who requires information aboutthe amount of rainfall in a particular region to estimategroundwater recharge for planning sustainable ground-water development. The user considers searching for anappropriate web service by specifying a keyword-based

service request. Within the UDDI, service providers mayuse different terminology for the specification of webservice categories. For example, the user requested webservice (WS1) may be published by a service provider

within the UDDI (public/organizational) under aweather category that provides weather informationor yet another service provider publishes a city informa-tion web service (WS2), listed under the utilitiescategory that outputs information about the amount ofrainfall received. Thus, a standard text-based servicediscovery of the requested service will include WS1within the predefined weather category; however itwill not include a potentially appropriate service WS2within the utilities category. The service descriptionfor WS1 is This web service returns historical weatherinformation for a given US postal code, date, and time. withinputs as PostalCode, Date, Time, and outputs as Tempera-ture, Humidity, Pressure, Precipitation. WS2 is described asDescribes city information for a specific US city and state. with its input parameters City, State, and outputparameters as Population, Temperature, Wind, Precipitation.In addition, the user formed service request may notinclude all the relevant keywords for discovering all theappropriate services within the UDDI. For example,the user may search for a web service stating Find thetemperature and rainfall based on zip code. However,there may be services published that provide relevantinformation based on regions, city names, addresses.These services could be combined with other locatorservices to yield better results. Also, some of thepublished web services may provide relevant informa-tion grouped under the term weather or the user maynot be aware of other parameters, e.g., precipitation,utilized by other web services providing the sameresultant information.

Based on the above example it is evident that for anefficient web service discovery 1) the user must be able todiscover all appropriate web services within the UDDIirrespective of the predefined categories, and 2) allappropriate web services must be successfully discoveredeven if the user is not aware of all the relevant terms thatinclude all appropriate web services.

The rest of the paper is organized as follows: Section 2provides background material and Section 3 presents anoverview of the proposed approach. Section 4 provides adetailed discussion on semantic categorization of webservices in UDDI. The detailed description for parameters-

based service refinement is presented in Section 5. Section 6includes a discussion on semantic similarity-based match-ing. The implementation details of the proposed approachand our evaluations are presented in Section 7. We presentthe related work in Section 8. Finally, conclusion and futurework are presented in Section 9.

2 BACKGROUND

In this section, we provide a brief background of themethodologies utilized for semantic categorization of webservices, parameters-based service refinement, and seman-

tic similarity-based matching. We briefly discuss theparameters for ranking semantic relationships in the contextof semantic-based service categorization. We also brieflydiscuss the hyperclique pattern discovery technique used

PALIWAL ET AL.: SEMANTICS-BASED AUTOMATED SERVICE DISCOVERY 261

Fig. 1. Semantics-based automated service discovery.


3/16

for service refinement. Finally, we provide an overview ofLSI in the context of semantic similarity-based matching.

2.1 Ranking of Semantic Relationships

Semantic relationship among ontology concepts is generallyranked based on three parameters including relevance,specificity, and the span of the relationship [5]. Below, wedescribe these parameters.

Relevance (Rel). Concepts may be associated with eachother with reference to multiple domains that are specific touser applications. The associated domain for a particularconcept may be expressed as a high-level concept in anupper ontology. For example, the concepts temperatureand pressure are associated in the atmospheric domain aswell as in the chemical reactivity domain. These domainsmay be represented by the weather and chemical conceptsin an upper ontology, respectively. Relevance comprises theassociated domain concept specified by the user and isindicative of the contextual relationship between theconcepts.

We use the predicate Rel to specify the relevancebetween any two concepts ti and tj. The predicateRelti; tj evaluates true if the concepts ti and tj are linkedto a common concept in the upper ontology.

Specificity (Sp). The concepts are classified based ontheir position in the concept hierarchy. Concepts in thelower level of the hierarchy are specific concepts where asthe higher level concepts are termed as generic concepts.For example, the entity location may be conveyed throughconcepts address and postal code. Address is a genericconcept whereas postal code is a specific concept.

We use the predicate Sp to specify the specificityrelationship between any two concepts ti and tj. The

predicate Spti; tj evaluates true if there is a downwardpath (indicating specialization) from ti to tj in the ontology.

Span (S). The span of the relationships expressing thesemantic association conveys the strength of linkage amongconcepts. The span, specified to restrict the scope of the userrequest, includes the coverage and the depth of theassociated concepts. Coverage includes the concepts at thepeer level of the considered concept where as the depthincludes level of descendants to be included. If the conceptsare linked within the specified span, the value of SpanSti; tj is equal to 1, else it is set to 0.

Ranking of the semantic association includes relevance,

specificity, and span. For a given web service, that includesft1; t2; . . . ; tng concepts describing the service, the overallrank is expressed as: Rti; tj,

Rti; tj k1 Relti; tj k2 Spti; tj k3 Sti; tj;

where 0 < k1; k2; k3 < 1 and k1 k2 k3 1.k1; k2; k3 are user-specified weights associated with

relevance, specificity, and span, respectively, to obtain theoverall rank of the semantic association.

2.2 Hyperclique Patterns Discovery

In this paper, we apply hyperclique patterns [34] for web

service discovery. Hyperclique patterns are based on theconcepts of frequent item sets [2]. Next, we first brieflyreview the concepts of frequent item sets and then describethe basic concepts of hyperclique patterns.

Let I fi1; i2; . . . ; img be a set of distinct items and letT represent the set of vectors with elements correspondingto input/output parameters and service description terms.Each vector in T is a subset of I. We call X I an itemset. An item set with k items is called a k-item set. Thesupport of X, suppX, is the fraction of vectors containingX. If suppX is no less than a user-specified minimumsupport, X is called a frequent item set. The confidence ofassociation rule X1 X2 is defined as confX1 X2 suppX1 [ X2=suppX1. It estimates the likelihood thatthe presence of a subset X1 X implies the presence of

the other item set X2 X X1. [10].A hyperclique pattern [34] is a new type of association

pattern that contains items that are highly affiliated with eachother. Specifically, the presence of an item in one servicedescription vector strongly implies the presence of everyother item that belongs to the same hyperclique pattern. Theh-confidence measure captures the strength of this associa-tion and, for an item set P fi1; i2; . . . ; img, is defined as theminimum confidence of all association rules of the item setwith a left hand side of one item, i.e.,

hconfP minfconffi1 i2; . . . ; img;

conffi2 i1; i3; . . . ; img; . . . ; conffim i1; . . . ; im1ggg;

where conf follows the classic definition of association ruleconfidence [2]. An item set P is a hyperclique pattern ifhconfP ! hc, where hc is the minimum h-confidencethreshold.

For example, consider an item set P fA;B;Cg. Assumethat suppfAg 0:1, suppfBg 0:1, suppfCg 0:06, andsuppfA;B;Cg 0:06, where supp is the item set support.Then, conffA B; Cg suppfA;B;Cg=suppfAg 0:6,conffB A; Cg 0:6, and conffC A; Bg 1. Hence,

hconfP minfconffA B; Cg; conffB A; Cg;

conffC A; Bgg 0:6:

Table 1 shows some example hyperclique patternsidentified from a real-world web services data set, whichincludes web service descriptions from various servicecategories, e.g., weather, financial, graphics, busi-ness, communication, and location. For example, thehyperclique pattern {mapurl, distanceunits, time, routeop-tions} is from the location category.

2.3 LSI

As part of our approach, we utilize LSI over a set of WSDL

documents and the terms in the service description andparameters. LSI, after analyzing a base set of web servicedocuments, finds relations between web service termsincluding service description and parameters. Given a term


TABLE 1Example Hyperclique Patterns


4/16

query, LSI translates it into concepts, and finds matchingdocuments and corresponding web services.

LSI is a statistical approach used to capture termrelationships and underlying domain semantics [9]. LSIextends the Vector Space Model (VSM) in accounting for theorder and association between terms. The associationbetween terms and documents are calculated and utilizedin LSI to reveal an underlying structure or pattern of wordusage across service descriptions. The LSI involvingSingular Value Decomposition (SVD) is an importantfactorization of a rectangular real or complex matrix. The

original matrix is approximated by a linear combination of adecomposition set of term to text-object association data.For example, at matrix Xo of terms and objects can bedecomposed into the product of three matrices.

X To So Oo, such that To and Oo have orthonormalcolumns and So is a diagonal matrix. This is an SVD of X.Keeping only the k largest singular values of So with theircorresponding columns in matrices To and Oo results in thematrix X0, where X0 is a unique matrix of rank k that isclosest to X such that: X X0 T S O.

One of the main challenges identified by LSI is related tothe cost of computing and storing SVD. Local LSI is anapproach for dealing with this computationally intensivetask which only considers the top-ranked service descrip-tions relevant to the service request. The results obtainedare applicable to our domain as it shares several common-alities with their domain, as follows: 1) the set of servicesrelevant to the service requestin our case it is the set ofweb service descriptions based on a common category thatprovide functionality for service requests; 2) low LSIdimensions, in their case one or twothis is importantsince in our case web service descriptions are short passagesthat result in low dimension vectors.

3 OVERVIEW OF THE

PROPOSED

APPROACH

Fig. 1 illustrates the key steps of the proposed approach forsemantic-based service discovery. The first step of theproposed approach involves semantic categorization of the

web services published in the UDDI. The next step dealswith selection of web services for a given service request.This step involves two tasks: 1) refinement of the set of webservices based on the input, output, and descriptionparameters of the service. The purpose of this refinementis to select a set of services from the service categorizationmodule representing the desired functionality in terms ofthe input and output service parameters, 2) enhancement ofthe web service request with relevant ontology terms, andthe matching of this enhanced service request with the set ofcandidate web services for selecting appropriate service.Fig. 2 illustrates the main components of the overall systemthat performs the two key steps related to automated servicediscovery. The Service Categorization module serves as theback end of the system and is executed once independentlyof individual service request. On the other hand, the ServiceSelection process is executed for each service request andserves as the front end of the overall system.

The ontology guided web services categorization, asillustrated in Steps 1 to 5 of Fig. 2, takes advantage ofclustering. In our approach, as illustrated in Steps 1 and 2,individual web services are represented as a vector thatcomprises of the terms of the service description and of theservices input and output parameters. We refer to thisvector as the Service Description Vector (SDV). The initialtask for service categorization involves improving thesemantic content of the SDV. We achieve this by extendingthe service description vector with relevant ontologyconcepts and terms. The improvement of the semanticcontent is followed by the process of grouping of serviceswith similar service functionality and published underdifferent service categories. For the grouping of web

services, we apply clustering to this web service data set,as illustrated in Step 3. The next step of our approach, i.e.,Step 4 involves the proper labeling of each group of theclustered web services. The labeling of web service groupsinvolves 1) determining the semantic category to which themember services belong based on the service functionality,and 2) the actual semantic categorization of the web serviceswithin the UDDI. We achieve this by associating an ontologyconcept for each cluster. Following this, we retrieve the webservice entries to represent the semantic information in theUDDI by creating tModels in the registry. The tModelcorresponds to concepts from the upper ontology, SUMO

[15], representing functionality of the service in a relevantdomain. The ontology is linked with the respective tModelusing the overviewURL: tag of the tModels.

The categorization of web services is followed by serviceselection from the relevant group of services. This isachieved by parameter-based service refinement as illu-strated in Steps 5 to 7 of Fig. 2. Parameter-based servicerefinement includes narrowing the set of appropriateservices matching the service request based on serviceparameters, i.e., input, output, and description. The refinedset of web services is then matched against an enhancedservice request as part of Semantic Similarity-based Match-

ing, as illustrated in Steps 8 to 13 of Fig. 2. Parameter-basedrefinement of web services begins with a representation ofthe web service parameters as a vector in which each entryrecords the terms of the operations input and output. The


Fig. 2. Automated service discovery components. The service categor-ization is performed offline on a regular basis and is independent of theservice request. Service selection is executed online and in real time ona per request basis.


5/16

set of related web services is represented by a collection ofsuch vectors, as illustrated in Step 5. Next we mine this webservice collection to find the frequent patterns that satisfy agiven support level and confidence level [34], as illustratedin Step 6. The frequent patterns represent the combinationof input and output service parameters related to theservice request. The terms in the patterns discovered bymining the input/output term vectors may not be semanti-cally related. Therefore, the set of discovered patterns ispruned based on the ranking of semantic relationshipsamong the terms. Then for each remaining pattern weretrieve the web services that have the pattern expressed aspart of the service description. The pruning of discoveredpatterns followed by retrieval of associated web services isillustrated in Step 7.

The refined set of web services is then matched againstan enhanced service request as part of Semantic Similarity-based Matching, as illustrated in Steps 8 to 13 of Fig. 2. Akey part of this process involves enhancing the servicerequest. Step 9 demonstrates the initial processing of the

service request and its transformation to a service requestvector. Our approach for semantic similarity-based match-ing utilizes ontology linking to enhance the service requestwith relevant ontology terms. In Steps 10 and 11 wecompute the semantic rank of the relevant terms fromthe ontology and utilize the semantic ranking to determinethe inclusion of the ontology concept in the original servicerequest. Step 12 indicates the formation of the enhancedservice request based on certain techniques described in thelatter part of this paper. For matching the enhanced servicerequest with the refined set of web service descriptionvector, we employ Latent Semantic Indexing (LSI) techni-

que. Step 8 involves the conversion of the refined set of webservices into the term document matrix for LSI.

In the proposed approach, both semantic-based servicecategorization and parameter-based service refinementdepend on the service description in the WSDL file.Additionally, we consider keyword-based search for servicediscovery. The brief textual descriptions of web servicefunctionality and little documentation on how to invokethem makes keyword-based searches vulnerable to return-ing irrelevant search results and therefore serves as aprimitive means for effectively discovering web services.Semantic annotation and matching of web services has beenproposed to address the drawbacks of syntactic web servicedescriptions. However, existing web services on the webusually are not equipped with semantic descriptions [21]. Afocused search for semantic service descriptions conductedby [20] with a specialized metasearch engine Sousuo foundnot more than about 100 semantic service descriptions inprominent formats like OWL-S, WSML, WSDL-S, andSAWSDL on the web. Klusch and Zhing [20] state that thisquantity appears tiny compared to more than half millionRDF sources indexed by the semantic web search engineSwoogle, and several hundreds of validated web servicedescriptions in WSDL found by Sousuo on the web.Semantic annotations aim to provide for richer specifica-

tions of web services. As a result, supplementing webservices with a semantic description of their functionalitywill further improve their discovery and integration basedon the proposed approach. With the goal of supporting

efficient service discovery and a view to simplify ourapproach we, however, currently validate our approachutilizing web services described with WSDL. However, ourapproach is not specific to a single approach to describingweb services and can be applied to syntactic web services,Semantic web services as well as a combined set of semanticand syntactic web services.

4 SEMANTIC CATEGORIZATION OF WEB SERVICES

In our approach we begin with the semantic categorization ofUDDI wherein we combine ontologies with an establishedhierarchical clustering methodology, following the servicedescription vector building process. For each term in theservice description vector, a corresponding concept islocated in the relevant ontology. If there is a match, theconcept is added to the description vector. Additionalconcepts are added and irrelevant terms are deleted basedon semantic relationships between the concepts. The result-ing set of service descriptions is clustered based on the

relationship between the ontology concepts and servicedescription terms. Finally, the relevant semantic informationis added to the UDDI for effective service categorization.With respect to our running example, additional conceptsfrom weather ontology are added to the description vectorsfor WS1 and WS2. Following this, both WS1 and WS2are grouped together utilizing hierarchical clustering. All theservices within this cluster (including WS1 and WS2) arethen associated with an upper ontology concept weatheras a category. Below is an outline of the key steps of ourapproach as illustrated in Steps 1 to 4 of Fig. 2,

1.Build the web service description vectors.2. Append relevant ontology concepts and deleteirrelevant terms based on the ranking of semanticrelationships among the terms.

3. Mine web service collection utilizing hierarchicalclustering and associate an upper ontology conceptfor each cluster and the relevant ontology conceptfor the corresponding subcluster.

The details of each of the steps for the semantic categoriza-tion of UDDI is included in Sections 4.1 to 4.4.

4.1 Web Service Vector Formation

The WSDL file forms part of the initial WSDL set and its

corresponding description and associated parameters areparsed as follows: the WSDL document processingincludes the extraction of the associated operation para-meters by extracting all terms under the and tag. The next step in the WSDL processinginvolves removal of markups and index entries, removal ofpunctuation, and using white space as term delimiters. Acollection of individual web service vectors represents theentire data set denoted as W S fws1; t01 . . . wsi; t

0ig,

where t0 ft1 . . . tkg for all ws 2 W S, the set of web servicesand t 2 T, set of all different terms in WS. Web service vectorformation is included in Step 1 of Fig. 2.

4.2 Web Service Vector Modification

Enhancing the service vectors with concepts from the coreontology resolves issues related to synonyms and induces



6/16

domain related concepts that provide the context. Toachieve semantic enhancement we utilize an approach thataugments the WordNet noun database with SUMO map-pings [25], [15]. These mappings provide a natural languageindex to the ontology concepts, mediate between thestructured concepts and free text and validate ontologycontent. This facilitates modeling of domain elements withrelevant ontology concepts by associating SUMO conceptswith input nouns via WordNet synsets. Thus, for ourrunning example, the service vector for WS1 is formed asweather information US postal code date time temperaturepressure humidity hour minute seconds zipcode rain addressstreet city state month year snow wind precipitation. The WS2service vector is city information US state address latitudepopulation male female longitude region description temperaturewind precipitation weather pressure humidity rain snow wind.

The first step in this phase of our approach involvesadding relevant ontology concepts to the initial servicevector. Our approach considers all concepts for enhancingthe web service description. The add step extends each

service vector by additional WordNet elements. This isfollowed by the retrieval of corresponding mapped SUMOconcepts represented by the set C. The modified webservice vector is the union of t0i and ci where ci 2 C.

The next step of this phase involves deleting irrelevantterms based on the ranking of semantic relationships amongthe terms. The complex relationships are based on propertysequences that link the two concepts in the semanticassociation. Two concepts ei and ej are semanticallyassociated with each other if there exists one or morerelationship Relij, between the concepts ei and ej, where1 i < n and 1 j < n, and n is the number of terms in a

web service. Next for each of these concepts we find therelevance, specificity, and the user specified span. The userassigns weights (k1; k2, and k3) for each of the parameters(Rel; Sp; S) as a threshold for concept selection. This alsomakes the ranking process more flexible. Our currentapproach assigns binary values to the ranking parameters.Assigning a range of specific values to these parameters ispart of our future work.

To illustrate our approach consider the following termswithin the web service vector, {temperature, pressure,postal code}. k1; k2; k3, as explained in Section 2.1, areuser-specified weights associated with relevance, specifi-

city, and span, respectively, to obtain the overall rank of thesemantic association between the web service vector terms.The selection of the weight coefficients is a key challenge forrelevant research. It is heuristics based and subjective tosome extent presently. The aim is to put forward theoptimal selection of weights used in the equation tominimize the variation bias. To overcome this difficulty, arange of weights were computed with reasonable assump-tions given for the observed results and analyzed results.The objective weight coefficients were obtained in theminimum variance of the difference between the analyzedfield and ideal field. For this phase of our approach we

consider equal user-specified weights, i.e., k1 0:33; k2 0:33 and k3 0:33; coverage 2, and depth 2. The threeconcepts are linked to the concept weather specified in theupper ontology. Thus Rel 1. The concepts are located in

the lower part of the concept hierarchy, this is indicative ofgreater specificity and as a result Sp 1. Since the conceptsfall within the specified span of weather domain inthe SUMO ontology, S 1. The semantic rank score of theassociation pattern is calculated to an integer rounded value

as 0:33 1 0:33 1 0:33 1 1. The associated seman-tic rank is utilized to determine the inclusion or deletion ofthe concept to the service description vector. The modify-ServiceVector algorithm (Fig. 3) gives the details. Step 2 ofFig. 2 executes the modification of the web service vector.

4.3 Clustering and Ontology Concept Association

After enhancing the service description vector with relevantontology concept, clustering of service vectors is performedto group functionally similar services together. Hierarchicalclustering facilitates classification of all the services, suchthat each subcluster and the combinations of subclusters

create a hierarchya structure that is more informativethan the unstructured set of clusters. This is the primaryreason that we adopt hierarchical, group-average agglom-erative clustering to group web services, since we want tohave informative clusters of the web services descriptions.Also, the approach of Heb and Kushmerick [11], of usingthe information contained in the service description todynamically create the categories for service classification,illustrates that hierarchical clustering is the best clusteringapproach for service classification.

The step following the formation of clusters includesassociating relevant SUMO ontology concepts. The associa-

tion of concepts to each cluster facilitates web servicediscovery by mapping to functional categories. A cluster iis defined as i cj where, cj is the corresponding ontologyconcept. The ontology concepts render semantic for web


Fig. 3. modifyServiceVector Algorithm.


7/16

service categorization. Our approach utilizes the mapping ofWordNet elements to SUMO concepts. We build a set whichcontains all concepts that exist in at least one servicedescription and eliminate duplicate concepts. This isfollowed by locating the position of the remaining conceptsin the concept hierarchy Hc. Each concept is checked forsubsumes or subsumed relationship with the elements of theset. The resultant superconcept is then mapped to the cluster.

The mapping of the ontology concept to the cluster extendssemantic information in UDDI. This is executed by thecreation of tModels for the associated web services of thecluster in the registry. The associateOntologyCluster algorithmas shown in Fig. 4 provides the details of our approach.

4.4 Web Service RegistryReliance on UDDI

One question that may arise is the extent to which ourapproach is reliant on UDDI. UDDI is a platform-independent, open industry initiative, XML-based registryenabling service providers to publish service listings anddiscover each other and define how the services interactover the Internet. There are several UDDI business

registries (UBRs) that provide the ability to locate theservices matching the search criteria in an efficient manner.Some of these UBRs include Microsoft, SAP, and NationalBiological Information Infrastructure (NBII) among others.In addition, various web service search portals (e.g.,RemoteMethods, Xmethods), search engines, e.g., Google,Yahoo, and Baidu and web service crawler engines (e.g., Al-masri [3]) have originated that are being used for servicediscovery. These portals and crawlers may not necessarilycomply with the original and established web servicestandards such as UDDI. However, they also incorporatethe storage of web services collected from various sources

into a central data repository which can be queried by users.In this sense, our service discovery approach can be appliedon all web services retrieved through, search engines,portals, crawlers, and UBRs. In addition, there also exists

the approach of publishing and discovering web servicesacross multiple registries grouped into registry federations,e.g., [32] for enhancing the discovery process.

In our approach, the use of UDDI is only as a base to becompliant with the universally adopted standard for servicediscovery. Our proposed approach can, however, beextended to the various approaches mentioned above fordiscovering web services. For supporting semantic-basedservice discovery, the proposed approach adds semantic-based service categorization and service request enhance-ment as separate layers on top of the UDDI. Addition ofthese layers affects the performance of the discoveryprocess in terms of increased timing delays. Given the factthat service categorization is performed offline and onlyservice request enhancement is performed during runtime,therefore the increase in the timing delay will not besignificant. While we do not directly measure this delay dueto service request enhancement in our experiments, it canindirectly be measured based on the size of the originalservice request and the enhanced service request. Typically,

a service request vector includes a maximum of 25 elementsand searching the ontology for an additional few tens ofelements will not have a significant overhead given theefficient ontology search mechanisms (a search onlyrequires logn time where n is the size of the ontology).

5 PARAMETERS-BASED SERVICE REFINEMENT

The next step is service selection from the relevant categoryof services using parameter-based service refinement. Webservice parameters, i.e., input, output, and description, aidservice refinement through narrowing the set of appropriateservices matching the service request.

The relationship between web service input and outputparameters may be represented as statistical associations.These associations relay information about the operationparameters that are frequently associated with each other.To group web service input and output parameters intomeaningful associations, we apply a hyperclique patterndiscovery approach [10]. These associations combined withthe semantic relevance are then leveraged to discover andrank web services.

For the running example, the first step of our approachfor parameters-based service refinement is to build theservice parameters association pattern item set for all

services within the weather cluster (including WS1 andWS2) [28]. The next step involves pruning the associationpattern based on concepts extracted from domain ontologyand a confidence threshold. This provides a set of rankedweb services matching service functionality. Below is anoutline of the key steps of our approach as illustrated inSteps 5 to 8 of Fig. 2,

1. Retrieve associated parameters forming the associa-tion pattern item set.

2. Perform Hyperclique pattern discoveries on theassociation pattern item set.

3. Rank the semantic associations between the terms.4. Prune the association patterns collection.

Sections 5.1 to 5.4 provide a detailed discussion of each ofthe steps for parameters-based service refinement.


Fig. 4. associateOntologyCluster Algorithm.


8/16

5.1 Service Parameters Retrieval

As discussed earlier, the web service description isprovided in the WSDL document. For retrieving therelevant service parameters, the corresponding WSDLdocument is processed to extract the associated operationparameters by retrieving all terms under the tag. The WSDL processing also includes stoplist

removal and stemming to strip word endings.5.2 Hyperclique Pattern Discovery

The process of searching hyperclique patterns can beviewed as the generation of a level-wise pattern tree. Everylevel of the tree contains patterns with the same number ofnodes. If the level is increased by one, the pattern size(number of objects in the pattern) is also increased by one.Every pattern has a branch (subtree) which contains all thesupersets of this pattern. Our algorithm for findinghyperclique patterns is breadth-first. We first check all thepatterns at the first level. If a pattern is not satisfied withthe user-specified support and h-confidence thresholds, the

whole branch corresponding to this pattern can be prunedwithout further checking. This is due to the antimonotoneproperty of support and h-confidence measures. Considerthe h-confidence measure, the antimonotone propertyguarantees that the h-confidence value of a pattern isgreater than or equal to that of any superset of this pattern.Following this manner, the pattern tree grows level-by-level until all the patterns have been generated. Inaccordance, for our example the input and outputparameters for WS1 are postalcode date time temperaturehumidity pressure precipitation and for WS2 are city statepopulation temperature wind precipitation. The hyperclique

patterns, along with support and h-confidence {hypercliquepattern (support, h-confidence)} generated are {temperature,pressure (9.52, 50 percent)}, {temperature, pressure, precipitation(14.29, 75 percent)} and {temperature, pressure, city (6.4,50 percent)}. This algorithm is very efficient for handlinglarge-scale data sets [34]. These patterns indicate thesupport and the h-confidence levels of association. Thepatterns are selected on the basis of the h-confidencethresholds. For our approach, we set the h-confidencethreshold to 50 percent.

5.3 Ranking Semantic Associations

The complex relationships are based on property sequencesthat link the two entities in the semantic association. TherankSemanticAssociations algorithm as shown in Fig. 5provides the details of our approach.

Two entities ei and ej are semantically associated witheach other if there exists one or more relationship Relijwhere 1 i < n and 1 j < n.

Next for each of these entities we find the relevance,specificity and the user-specified span. The user assignsweights for each of the parameters to refine the request.This also makes the ranking process more flexible. Ourcurrent approach assigns binary values to the rankingparameters. Assigning a range of specific values to these

parameters is part of our future work. To illustrate ourapproach consider the following association pattern {tem-perature, pressure, postal code}. The user-specified weightsar e k1 0:3, k2 0:4 and k3 0:3, coverage 2, and

depth 2. The three concepts are linked to the conceptweather specified in the upper ontology. Thus, Rel 1. Theconcepts are located in the lower part of the concepthierarchy, this is indicative of greater specificity and as aresult Sp 1. Next we determine if the concepts fall withinthe specified span within the weather domain. As illu-strated by the WeatherConcepts ontology, the concepts areincluded in the specified span thus S 1. The semantic

rank score of the association pattern is calculated as 0:3 1 0:4 1 0:3 1 1. The associated semantic rank isutilized to sort the association pattern collection.

5.4 Association Pattern Collection Pruning

A large number of association patterns are generated in theassociation pattern mining phase. Patterns containingirrelevant information that will negatively influence theservice discovery process need to be discarded. Thepruning of the association pattern collection is based on[4]: 1) eliminate the association patterns that have a lowsemantic relationship ranking between its terms; 2) retainthe generic patterns with high confidence. This is illustrated

in the pruneAssociationPatterns algorithm in Fig. 6. Func-tions 1 and 2 are listed in Fig. 6.

Function 1. Given two patterns. X1 ) Y1 and X2 ) Y2,the first pattern is eliminated if ScoreSemRankfpX1; Y1g confX2, 2) if the confidences are equal, supportfor X1 must exceed that for X2, suppX1 >suppX2.

6 SEMANTIC SIMILARITY-BASED MATCHING

The parameter-based refined set of web services is thenmatched against an enhanced service request as part ofSemantic Similarity-based Matching. A key part of thisprocess involves enhancing the service request. Our


Fig. 5. rankSemanticAssociations Algorithm.


9/16

approach for web semantic similarity-based service selec-tion employs ontology-based request enhancement and LSI-based service matching.

The basic idea of the proposed approach is to enhancethe service request with relevant ontology terms and thenfind the similarity measure of the semantically enhancedservice request with the web service description vectorsgenerated in the service refinement phase [27]. For evaluat-ing this similarity, we employ LSI-based technique that usescosine measure as the similarity metric.

A key issue in discovery of web services refers to the

query language utilized to form the web service request.The web service request can be formed in two ways, i.e., asyntactic web service request and a semantic web servicerequest. The syntactic web service request, in its most basicform, utilizes simple text to form a web service request.Syntactic web query languages such as XQuery, XSLT,GQL, and Lucene among others, which have been tailoredspecifically for declarative and efficient access and proces-sing of web data, may also be utilized to form the webservice request. The web service request may also beformed of a set of semantics-based XML languages, such asRDF and OWL, that rely on ontologies to explicitly specify

the content of the tags to annotate the service request. Mostof the RDF query languages today are relational based,such as SPARQL, RQL, and TRIPLE among others.Compared with formal queries, keyword-based querieshave the following advantages: 1) a simple syntax in termsof a list of keyword phrases, 2) Open vocabularies whereinthe users can use their own words to express theirinformation requirement, and 3) the familiarity of the userwith these interfaces due to their widespread usage.However, the fundamental disadvantages of a keyword-based web service request are the lack of precision and thelack of verifiability.

A new, semantics-based approach is necessary not only

to reduce this information overload problem, but also toenable more effective and productive services over the web.Our research validates the limitations of keyword-basedsearching and provides an approach in which semantics

enhanced web service request overcome these limitations.In this paper, we report on the experiment in which weevaluate the benefits and drawbacks of the added value andpitfalls of semantic enhancement of web service requestover pure keyword matching technique. Thus, thoughkeyword-based web service discovery has proven itsusefulness, applying semantics-based web service requeststrategies should greatly increase the resulting precision of

searches and enable new types of web service requests to beformed. For our running example, we use the weather webservice request:

Service Request (SR). Find the temperature and rainfall basedon zip code.

Below is an outline of the key steps of our approach (asillustrated in Steps 9 to 13 of Fig. 2), followed by a detaileddiscussion of each of the steps.

1. Preprocess service request and determine the overallsearch category of web services for the search.

2. Index the web service description collection andretrieve relevant service descriptions.

3. Preprocess the service descriptions set and retrieveassociated concepts related to the initial servicerequest from the ontology framework.

4. Acquire the associated concepts related to the initialservice request to expand the request. Transform theservice description set into a term-document matrix.

5. Perform SVD on this matrix.6. Project the description vectors and the request

vector and utilize the cosine measure to determinesimilarity.

We now go into the details of each step.

6.1 Service Request PreprocessingThe service request is parsed and preprocessed. Preproces-sing includes: the removal of markups, translation of uppercase characters into lower case, punctuation removal, andwhite space used as a term delimiters, stoplist removal, andstemming to strip word endings. The outcome of thispreprocessing is in a term vector yielding term frequency.In our weather service example, the SR is transformed to{temperature, rain fall, zip code}. The SR terms are thensearched in the upper ontology to extract the related upperconcepts. These concepts are utilized to determine thecategory of the web services to be searched for discoveringthe most appropriate web service satisfying the requestedfunctionality. The upper concepts are retrieved by extract-ing the root concepts of the concept hierarchy that have theSR terms as its leaf nodes. In our example, this results in{weather}.

6.2 Service Description Retrieval

The corresponding relevant service collection forms thecategorized WSDL set. As shown in Fig. 2, Step 5 involvesthe selection of web service descriptions (WSDL files) thatare categorized as weather services in the UDDI. categor-yBag that is an optional element of tModels is used forservice categorization. A service can specify its position

within the general classification scheme by, for example, anoptional list of name-value pairs that are used to givetaxonomy information, like industry, product, or geo-graphic codes. These documents are then parsed and


Fig. 6. pruneAssociationPatterns Algorithm.


10/16

processed to form the term-document matrix. The WSDLdocument processing includes the extraction of the textunder the tag. The extracted text formsthe service description. Additionally, we consider theassociated operation parameters by extracting all termsunder the tag.

6.3 Ontology Concept Acquisition

The initial web service discovery process is not explicit asmost of the users are not entirely aware of documentcollection as well as the domain information. It is, therefore,difficult to formulate a precise SR. This guides us towarditerative SR formulation. This part of our approach is based

on the introduction of relevance feedback for informationretrieval [30]. Our approach builds on the manual process toprovide a semiautomated technique to expand the SR basedon the existing terms that make up the SR. The primaryobjective is to extract associated concepts from the domainontologies that are determined as relevant and enhance theexisting SR. We developed and reused ontologies to form adomain ontology framework. The ontology concepts wereextracted by ontology linking based on ontology-to-ontol-ogy mapping.

6.4 Service Request Expansion and

Term-Document Matrix FormationOne of the assumptions in our experiments is related tosimplification in modeling of the ontology framework.Currently, our approach for linking ontologies is based on

retrieval of concepts traversing two links expressing anassociation [18]. This restricts us in gathering conceptsacross a single ontology at the same level. However,multiple iterations of the concept gathering function enableus to traverse one ontology at each step across the threebroad levels. For example, corresponding to our request,our initial ontology modeled a weather forecast as having aset of features with a specific feature having a set ofcharacteristics. This, however, requires querying acrosstwo associations, e.g., weatherforecastWF hasParameterfeatureFfeatureF is temperatureTtemperatureT hasunitU. Since this is not feasible, we currently list thefeature (sky, station, temperature, visibility, wind) withoutmodeling the details of the feature. Therefore, theassociated concepts are represented as {weatherforecast|sky,station, temperature, visibility, wind}. The expanded requestis thus a union of the original terms and the ontologyconcept along with their concept hierarchies as mentionedabove. The enhanced service request is represented as;

Enhanced Service Request (ESR): windchill heat humiditydewpoint wind pressure conditions visibility sunrise sunset statemoonrise moonset precipitation temperature rainfall zip coderegion address city state latitude longitude postal code

generateEnhancedRequest Algorithm (Fig. 7) shows themain steps involved in the generation of the expanded SR.

Currently, service request expansion is implemented byusing windowing and information display. Specifically, theretrieved relevant terms are graphically displayed for theuser. The terms chosen by the user are then included toreformulate the expanded service request.

The WSDL file forms part of the categorized WSDL setand its corresponding description and associated para-meters are parsed as explained above. The next step in theWSDL processing involves removal of markups and indexentries, removal of punctuation and using white space asterm delimiters. WSDL processing also includes stoplistremoval and stemming to strip word endings. obtainRedu-

cedDimensionForm Algorithm (Fig. 8), describes the proce-dures of establishing the term-document matrix. WSDLprocessing results in a term-document matrix wherein eachcell entry indicates the frequency with which a term appears


Fig. 7. generateEnhancedRequest Algorithm.

Fig. 8. obtainReducedDimensionForm Algorithm.


11/16

in a document. Consequently, the term-document items aretransformed using an ltc weighting: normalization of thedocument length following the calculation of the log valuesof individual cell items, multiplying each item for a term bythe IDF weight of the term.

6.5 SVD Transformation

The SVD program calculates the best reduced dimensionapproximation for the transformed term-document matrix.A reduced dimension vector for each term and eachdocument and a vector of the singular values form theoutcome of the SVD analysis. This reduced dimensionalrepresentation is used for determining the appropriate webservices. The cosine similarity between the term-term,request-description is used as a measure of similarity forfurther analysis of this representation.

6.6 Service Request Projection

This step involves projecting the description vectors and therequest vector and utilizing the cosine measure to deter-mine similarity. This is followed by ranking the correspond-ing web services as most appropriate based on a highersimilarity measure. See MappingRequest algorithm (Fig. 9)for details.

7 EXPERIMENTAL EVALUATION

The effectiveness of our approach is shown by conductingthree set of experiments: 1) Semantic categorization of theweb services in the UDDI; we evaluate the effectiveness ofour results by utilizing f-measure. F-measure [6] is based onprecision and recall of each cluster C from a set of services

with service categories preassigned by users manually.2) Semantic similarity-based matching; we compute scoresto rate the matching that are the average of a 10-pt precision-recall curve. The average of the precision is evaluated at 6, 10,18 service descriptions retained and the average of recallevaluated at 50-100 service descriptions retrieved, and 3) theoverall time taken, measured in seconds, for servicediscovery. To be able to evaluate, we developed a prototypeof our approach. The implementation and deploymentdetails of our approach are described in [27].

7.1 Semantic Categorization of Web Services in

UDDIA total of 25 service requests and 800 service descriptionsformed the collection of web services. The collectionincluded web services compiled by the project described

in [17]. We have also added additional WSDL files fromxmethods [14] and from individual file search using searchengines, e.g., Google. Data Set (Da) comprises unlabeledweb services and additional web services downloadedacross various domains. In the data set Da, servicecategories were preassigned by users manually. Data Set(Db) represents the categorized services from [17]. Data Set(Dab) represents the combined set of web services. Thiscollection of web services is classified into 30 categories. Theclassified service descriptions support a large number ofvaried requests and provide a sufficient testbed for servicediscovery. For the experimental evaluation of semanticcategorization of UDDI, the data set is represented into fourversions, i.e., where in the maximum number of services ina category is restricted to 5, 10, and 15, respectively. Thecategories that contain excess documents are not excluded,however, only the maximum number of documents in theparticular version is considered. The min-3 max-5 version,however, disregards all categories that contain less thanthree web service instances.

For evaluating the proposed approach for semanticcategorization of web services, we structure four preclus-tering techniques. The process of data analysis andclusters formation is preceded by a preprocessing stepthat includes stopword removal, stemming, and pruning toreduce the noise in the data. Additionally we also consideraddition of related concepts to the data using ontology,deletion of irrelevant terms with and without adding newconcepts. In particular, we consider the following datasetup for clustering,

1. Orig.the initial setup is utilized to serve as abaseline for further comparisons. This setup includesall initial preprocessing techniques, i.e., stoplist,stemming, and pruning.

2. Addthis setup includes related concepts from thecore ontology. This expansion of service vectorbuilds on the mapping of the WordNet lexicaldatabase to the SUMO ontology.

3. Deletethis involves the removal of the irrelevantterms from the service vectors. Irrelevant terms aredetermined based on the frequency of their occur-rence. In particular, we delete all terms that appear alesser number of times as compared to a presetthreshold.

4. Add and Deletethis technique is a combination ofadd and delete.

Clustering is performed on each of the above cases. Theclustering results are derived from a preassigned set ofcategorized web services. We present our results for each ofthe web service data sets in combination with the fourtechniques. Figs. 10a, 10b, 10c, and 10d plot the cluster sizeversus the average f-measure over all the data sets for eachtechnique. For experiment test runs higher f-measure valuesindicate higher quality of the clusters formed.

7.1.1 Experiment 1Orig. Setup

Fig. 10a depicts the results for the original setup. Asobserved in all the experiments the f-measure values are farfrom 1. The experimental results in Fig. 10a serve as abaseline for comparing the results with other data setups.


Fig. 9. MappingRequest Algorithm.


12/16

7.1.2 Experiment 2Add Setup

Fig. 10b shows the results for the Add setup. We observethat adding relevant terms from ontology yields animprovement over experiments conducted with the originaldata sets as illustrated in Fig. 10b. This leads us to aconclusion that adding relevant domain knowledge for all

the terms is not all that helpful. The lack of high returns inresults is on account of the generic nature of the SUMOontology that does not focus on a specific domain. This maybe due to the fact that a large number of web servicedescriptions have overlapping categories. The addition ofterms related to these overlapping domains creates addi-tional noise which is not resolved by the clusteringalgorithm. A possible approach to overcome this effectwould be to consider addition of concepts from theontology to only the relevant terms, accounting for context.The ontology serves as a guide for clustering thatincorporates domain knowledge and more focused infor-mation. We consider two criteria viz., span and depth, to

determine the coverage of the ontology concepts. The exactparameters determining the coverage aim to achieve thesmallest set of additional ontology concepts while main-taining the best overall coverage within the smallest set.

7.1.3 Experiment 3Delete Setup

Better results for cluster quality were observed with termreduction from the service description vectors as illustratedin Fig. 10c. The term reduction involved pruning individualterm vectors of irrelevant and low frequency terms whichincreases the specificity of the services.

7.1.4 Experiment 4Add and Delete Setup

This setup aims to maintain a balance between thegenerality and the specificity of terms in web servicedescriptions. This is achieved by expansion of the termvectors with relevant ontology concepts and subsequentreduction of terms from the web service descriptions. Theresults follow those observed in the add set of experiments.The technique, where in ontology concepts are added to allterms of web service descriptions followed by pruning,results in increased generality. The best results (illustratedin Fig. 10d) compared to all techniques were observed in thetechnique, where in ontology concepts are added torelevant terms of web service descriptions followed bypruning. This results in an increase in specificity andreduction of generality of the terms in web servicedescription. The improved results may be explained onaccount of the generality-specificity balance achieved byadded semantic providing a good representative set forbetter categorization and the overall reduction of noiseadded to the vector representations.

7.1.5 Summary of Results

We can see in all four data setups that the results improve(in terms of F-measure) with an increase in the number ofclusters. These results validate the scalability of our

approach. Also, it can be noted in Figs. 10a, 10b, 10c, and10d that the graphs clusters formed with all availableservice descriptions yield lower f-measures as compared tothose formed in experiments with controlled cluster size.This may be explained by an increase in the purity ofclusters with lesser number of service descriptions incomparison to that of a cluster with maximum number ofservice descriptions for individual categories.

Another aspect of our evaluation deals with thefrequency of service categorization for the entire UDDI.We perform service categorization on an incremental basis.We assume that the ontology is not perfect and that the

ontology is updated to represent additional domain objectsand their interrelationships. Then the categorization mustbe performed every time a newer service is added to theUDDI. However, periodic categorizations may be requiredif the service additions are frequent, as can be expected inreal-life situations with large user and provider commu-nities. However, we can update the service category byisolating the upper ontology concept that remains un-changed and then recategorizing all the services that fall inits child concepts. When evaluating the efficiency of ourapproach, there are a number of factors that affect thetimings obtained viz., the size of the underlying ontology

and the number of service to be categorized. We found thatthe total processing time for the service categorization was259 seconds for our test set of 800 web services with anapproximate 1,000 concepts of the ontology data.


Fig. 10. (a) Experiment 1Orig. Setup. (b) Experiment 2Add. Setup(addition of ontology concepts to relevant terms of the service descriptionvector). (c) Experiment 3Delete Setup. (d) Experiment 4Add andDelete Setup (addition of ontology concepts to all terms of the servicedescription vector).


13/16

For evaluating the analytical complexity of the proposedservice categorization approach, let n represents the totalnumber of concepts that form the ontology and mrepresents the total number of web services. For searchinga specific concept in the ontology, O(log n) searchoperations need to be performed. The add operation forincluding the relevant concepts for each web service occurs

in constant time. For this reason the standard representationof our approach for service categorization would beOmlogn n.

7.2 Semantic Similarity-Based Matching

For evaluating our approach for semantic similarity-basedservice discovery we set out to discover relevant services foran average of ten service requests. For the purpose of thispaper we report our results for the request Find thetemperature and rainfall based on a given zip code.

The initial discovery is based on a smaller number ofWSDL files with a focus on precision. The next discoveryexperiment examines a larger section of WSDL files with afocus on maximizing recall. In order to assess the impact ofservice request expansion with relevant terms from ontol-ogy concepts on service discovery, we compare the cosinemeasure-based similarity scores of the two different serviceselection methods; with enhanced service request andoriginal service request. Table 2 shows LSI-based serviceselection scores for the six weather web services obtainedfrom calculating the cosine measure-based similarity resultsbetween the service descriptions and the service request.web services {W3 and W6} are the most appropriatematches for our example service request. With an expandedservice request over categorized services we notice an

improved result over the original service request. However,this is not observed when we exclude the related conceptsderived from the ontology. Services {W3 and W6} havehigher scores from a similar web service {W1} for ESR incomparison to SR. The higher scores are indicative of theappropriateness of the service in terms of the requestedfunctionality. The expanded service requests, thus, facilitateimproved differentiation between the appropriate servicesand the rest of the services on account of the higher scoredifferences indicating a better match to the service request.

Further, the performance of LSI and expanded servicerequest is measured by observing the precision and recall

levels at 6, 10, and 18 services. The expanded servicerequest has greater overall precision indicating it returns ahigher percentage of relevant services over the three levelsof services retained. Comparing the two methods for service

discovery the service request expansion method is better asindicated by the recall.

The experiments were conducted using categorizedservices that included 1) 50 web services that have news,financial, location, and graphics as their service category.2) 100 web services that have news, financial, location,graphics, games, business, flights, web, and music as theirservice category. The ranking of the services change as moredimensions are added to the service collection underconsideration. However, we notice that all the relevantservices are retrieved in the top 20 percent of the number ofservices being considered. The experimental results alsoindicate that the categorization of services yields signifi-cantly better results in terms of the specific web servicesbeing returned for a particular service request.

7.3 Performance

We have compared the time taken to match a ServiceRequest with a web service description (for Dab) withinservice sets that include 1) predefined categories, 2) semantic

categorization, and 3) entire service set or the set ofuncategorized services. The basis of this experiment is tovalidate our approach for an ontology guided web serviceCategorization. It was observed that the time taken forservice matching within pre-defined categories, semanti-cally categorized (our approach) and uncategorized serviceswas 2.58, 3.65, and 406.8 seconds, respectively. We observethat our approach provides a balance in terms of quality ofthe service selected and also the time taken for matching ofan appropriate service.

The observed time for service discovery seems accep-table, especially given that most of the time users will

submit more incremental, and hence less time consumingrequests. The time it takes to load the system though couldbe improved. In the future, we plan to further evaluate thescalability of our approach, along with detailed experimen-tation with actual users to fine tune the way in which ourintegrated functionality is presented and to eventuallyevaluate the full benefits of our approach from a perfor-mance and solution quality standpoint.

7.4 Deployment

In the existing architecture, the service provider/requestoraccesses the UDDI through an application server. To deployour approach, we need to enhance this by incorporating a

semantic application server as well as an ontologyrepository. The Application Server now executes ourapproach to select the most suitable services based onsemantics processed by the Semantic Application Server inconjunction with the ontology repository. The SemanticApplication Server should include an ontology reasoner(e.g., Racer) that utilizes description logics to load andquery ontologies to extract the relevant concepts forsemantic categorization of web service descriptions andenhancement of service requests.

Since our proposed work considers semantic function-ality of web services for service discovery and ranking, we

do not explicitly address other QoS measures such as trustand reputation. However, this can be easily incorporated asfollows: for example, a trust and reputation registry couldbe integrated with the UDDI server. Now, the selection of


TABLE 2Service Scores for Individual Weather Services


14/16

appropriate web services needs to incorporate the calcula-tion of a weighted average of the functionality and the trustand reputation of a web service.

Depending upon the number of web services and servicerequests, we may need to use XML gateway devices tooffload the work of parsing and transformation of XML toreduce the computational burden. Another issue withdeployment is that of reflecting, cascading, and manage-

ment of updates in ontologies within the associated webservices concepts. Ontology updates must be carefullymanaged. There are three key tasks associated with this:first, the revised ontology needs to be assessed andevaluated to ensure logical consistency and check the levelof axiomatization. The metadata of the linked web servicesmay also need to be updated. This is particularly importantif the ontology is a domain specific ontology as the updatesto the concepts may result in a changed categorization ofthe associated web services. The frequency and scaleof these changes will impact the execution and performanceof our approach. Note however, that both assessment and

recategorization can be done offline, while the existingontology is still being utilized. The revised ontology canthen serve as a drop-in replacement. Alternatively, theUDDI server may decide to adopt and maintain differentversions of the ontology. In this case, the web servicerequestors need to be notified, in an intuitive manner, of theversion changes to ontologies and web services. While thesechallenges must be considered, good design can ensurerobust deployment of our approach in terms of computa-tional complexities and overheads.

8 RELATED WORK

The challenges pertaining to automatic classification of webservices have been addressed in prior work [7], [11], [8],[26]. In [11], Heb and Kushmerick propose an approach ofusing the information contained in the service descriptionto dynamically create the categories for service classifica-tion, comparing five clustering algorithms. The classifica-tion process has similarities to our approach in terms ofconstruction of term vectors with relevant words andutilizing a hierarchical clustering approach for achievingthe best results. Our approach builds on this by 1) includingrelevant semantic concepts based on semantic relationshipranking for expanding the domain coverage, 2) deletion ofnonrelevant terms resulting in the reduction of noise and

increase in the purity of the clusters.Bruno et al. [7] propose a classification approach

utilizing Support Vector Machines (SVM) to classify theterm vectors. Bruno et al. [7] also make use of concept latticecreated using Formal Lattice Analysis to identify conceptsfor a specific domain as well as the relationships betweenservices belonging to a class. This approach is the closest toour approach. Our approach, however, is based on gleaningof semantic utilizing a domain ontology hierarchy. Ad-ditionally, from our point of view, this approach does notaddress the issue of SVM mapping training data to higherdimensional space, and then finding the maximal marginalhyperplane to separate the data.

One of the approaches for enhancing the training time ofSVM, specifically when dealing with large data sets,recommends hierarchical clustering analysis. Also ontolo-gies can be used to improve Formal Concept Analysis (FCA)

applications. In standard FCA, the set of attributes does notcarry any structure. By considering this set as a set ofontology concepts, we canmodel relations and dependenciesbetween the attributes. Although this does not increase thecomplexity of the resulting lattices (as concept lattices cover,up to isomorphism, the whole class of complete lattices), itenriches theconceptual structure andprovidesnew means ofinteraction and analysis. FCA may also complement our

approach by facilitating ontology merge and linking toprovide a better depth and span in terms of the domainconcepts coverage.

In [26], Oldham et al. propose a framework to semi-automate the semantic annotation of web services forclassification-based on matching web service data typesand domain ontology concepts making use of schemamatching. The main drawback of this is that it is not simpleto find similarities with domain ontology concepts as nosingle domain concept contains the complete structure of acomplex schema containing all service parameters.

Existing approaches to web service matching address

either syntactic and/or semantic matching, e.g., Sajjanharet al. [29] have studied LSI to acquire the semanticassociations between short textual web service descrip-tions, Corella et al. [8] describe a heuristic approach forsemiautomated web services classification based on apreviously classified services corpus. These approachesutilize the initial web services descriptions advertised byservice provider and functionality request specified by theservice requestor. These initial descriptions do not includeany semantic augmentation. Our approach extends thiswork by adding semantics to the service request. Asvalidated by the experimental results, this helps us achieve

improved results for appropriate service discovery. Themost widely used IR technique constitutes the Vector-Space Model [31]. VSM, however, considers the syntacticaspect of term association and does not account for theunderlying semantic structure. Kokash et al. [12] addressthe inadequacy of VSM approach by expanding both theservice query and the WSDL descriptions. A Hybridmatching approach is proposed that may combine variousmatching methods (e.g., syntactic and semantic) into acomposite algorithm. This enables ad hoc composition ofseveral (pre-existing) matching approaches based onpredefined criteria. In principle, this is similar to our

work. However, although it may provide flexibility, it alsoincreases the human intervention for selection of acomposite algorithm applicable to a set of services forspecific application.

The usage of synonyms does not capture the overallsemantics of the domain and application functionality.However, our approach utilizes concepts extracted fromdomain ontology. These extracted concepts account forrelationships between the domain objects and provide acomprehensive coverage for the underlying semantics forboth the domain and the application functionality. Ourapproach appends the syntactic service description with

relevant semantic terms. This enables uniform combinationof syntactic and semantic matching rendering our approachmore generalizable for overall service matching andrequiring minimal human interaction.



15/16

Our approach has similarities to existing approaches[33], [1], [2], [19] of natural language processing techniquesthat address the text part of the challenge in content-basedimage retrieval (CBIR). These approaches, however, wereused in isolation to one another. Our approach, on the otherhand, combines both these techniques using concept lists,distance within an ontological structure and latent semanticindexing.

In [13], Sassen et al. describe the SeCSE approach forarchitecture time service discovery that is based onontologies that are validated, easy to use, complete, andwidely accepted in domains. In contrast to this, ourapproach begins with the description of an ontologyframework that includes upper ontologies, e.g., SUMOand more descriptive domain and application relatedontologies. We propose a linked ontology structure for awide-ranging description of domain semantics. Our ap-proach for service discovery initiates service requestenhancement with concepts extracted from related domainontologies and reduces the space of service request and

WSDL specification term vectors utilizing LSI to reduce thedimensions to be considered.

9 CONCLUSION AND FUTURE WORK

In this paper, we present an integrated approach forautomated service discovery. Specifically, the approachaddresses two major aspects related to semantic-basedservice discovery: semantic-based service categorizationand semantic-based service selection. For semantic-basedservice categorization, we propose an ontology guidedcategorization of web services into functional categories forservice discovery. This leads to better service discovery bymatching the service request with an appropriate servicedescription. For semantic-based service selection, weemploy ontology linking (semantic web) and LSI thusextending the indexing procedure from solely syntacticalinformation to a semantic level. Our experiments show thatthis leads to increased precision levels, recall levels, and therelevance scores of the retrieved services.

In the future, we will extend our approach to allowservice requests that are formed using specialized querylanguages. We can then match these requests to semianno-tated services that are described using formats such asSAWSDL, OWL-S among others. We can also extend our

work for web service composition. Typically, multipleservices have to be discovered so that they together matcha service request. It should be possible to utilize ontologies,and explicitly return the sequence of individual serviceinvocations to be performed in order to achieve the desiredcomposite service. When no full match is possible, a flexiblematching approach could be created to return partialmatches and/or suggest additional inputs that wouldproduce a full match by capturing the dependencies amongthe matched services. This has several interesting researchissues. Another avenue for future work is to create aninteractive, intelligent service composer that is semanticallyguided to locate the target service components step by step.

We also intend to extend our ontology framework andinvestigate additional mapping tools to better express aservice request to search for relevant concepts. Finally, aspart of the service discovery process we will explore

associating semantic weights to the retrieved set of web

services for effective semantic ranking of the results.

ACKNOWLEDGMENTS

This work was supported in part by the US National Science

Foundation under grant IIS-0306838 and SAP Labs, LLC.

REFERENCES[1] J. Adcock, A. Girgensohn, M. Cooper, T. Liu, L. Wilcox, and E. Rie,

FXPAL Experiments for TRECVID, Proc. TRECVID, 2004.[2] R. Agrawal, T. Imielinski, and A. Swami, Mining Association

Rules Between Sets of Items in Large Databases, Proc. ACMSIGMOD Intl Conf. Management of Data, 1993.

[3] E. Al-Masri and Q.H. Mahmoud, Investigating Web Services onthe World Wide Web, Proc. 17th Intl Conf. World Wide Web(WWW 08), Apr. 2008.

[4] M.-L. Antonie and O.R. Zaane, Text Document Categorization byTerm Association, Proc. IEEE Intl Conf. Data Mining (ICDM 02),2002.

[5] K. Anyanwu, A. Maduko, and A. Sheth, SemRank: RankingComplex Relationship Search Results on the Semantic Web, Proc.

14th Intl Conf. World Wide Web (WWW 05), 2005.[6] P. Baldi, P. Frasconi, and P. Smyth, Modeling the Internet and the

Web, Probabilistic Methods and Algorithms, Wiley, 2003.[7] M. Bruno, G. Canfora, M.D. Penta, and R. Scognamiglio, An

Approach to Support Web Service Classification and Annota-tion, Proc. IEEE Intl Conf. E-Technology, E-Commerce and E-Service(EEE 05), 2005.

[8] M.A. Corella and P. Castells, Semi-Automatic Semantic-BasedWeb Service Classification, Proc. Intl Conf. Business Process

Management Workshops (BPM 06), 2006.[9] P.W. Foltz and S.T. Dumais, Personalized Information Delivery:

An Analysis of Information Filtering Methods, Comm. ACM,vol. 35, no. 12, pp. 51-60, 1992.

[10] E. Han, G. Karypis, and V. Kumar, Scalable Parallel Data Miningfor Association Rules, Proc. ACM SIGMOD Intl Conf. Manage-ment of Data (SIGMOD 97), 1997.

[11] A. Heb and N. Kushmerick, Automatically Attaching SemanticMetadata to Web Services, Proc. IJCAI Workshop InformationIntegration on the Web, 2003.

[12] http://dit.unitn.it/~kokash/documents/WS_matching-hybrid.pdf, 2012.

[13] http://idcrue.dit.upm.es/biblioteca/mostrar.php?id=2154, 2012.[14] XMethods, http://www.xmethods.net, 2012.[15] http://reliant.teknowledge.com/DAML/SUMO.owl, 2008.[16] http://www.uddi.org/specification.html, 2012.[17] http://www.few.vu.nl/~andreas/projects/annotator/ws2003.

html, 2012.[18] H.L. Johnson, K.B. Cohen, W.A. Baumgartner Jr., Z. Lu, M. Bada,

T. Kester, H. Kim, and L. Hunter, Evaluation of Lexical Methodsfor Detecting Relationships Between Concepts from MultipleOntologies, Proc. Pacific Symp. Biocomputing, 2006.

[19] M. Kher, D. Brahmi, and D. Ziou, Combining Visual Featureswith Semantics for a More Efficient Image Retrieval, Proc. 17thIntl Conf. Pattern Recognition (ICPR 04), 2004.

[20] M. Klusch and X. Zhing, Deployed Semantic Services for theCommon User of the Web: A Reality Check, Proc. IEEE Intl Conf.Semantic Computing (ICSC), 2008.

[21] J. Lu, Y. Yu, D. Roy, and D. Saha, Web Service Composition: AReality Check, Proc. Eighth Intl Conf. Web Information SystemsEng. (WISE 07) Dec. 2007.

[22] D. Martin, M. Paolucci, S. McIlraith, M. Burstein, D. McDermott,D. McGunneess, B. Barsia, T. Payne, M. Sabou, M. Solanki, N.Srinivasan, and K. Sycara, Bringing Semantics to Web Services:The OWL-S Approach, Proc. First Intl Workshop Semantic WebServices and Web Process Composition, July 2004.

[23] S. McIlraith, T. Son, and H. Zeng, Semantic Web Services, IEEEIntelligent Systems, vol. 16, no. 2, pp. 46-53, Mar. 2001.

[24] S. McIlraith and D. Martin, Bringing Semantics to Web Services,IEEE Intelligent Systems, vol. 18, no. 1, pp. 90-93, Jan. 2003.[25] I. Niles and A. Pease, Linking Lexicons and Ontologies: Mapping

WordNet to the Suggested Upper Merged Ontology, Proc. IEEEIntl Conf. Information and Knowledge Eng. (IKE 03), 2003.



16/16

[26] N. Oldham, C. Thomas, A. Sheth, and K. Verma, METEOR-SWeb Service Annotation Framework with Machine LearningClassification, Semantic Web Services and Web Process Composition,vol. 3387, pp. 137-146, Jan. 2005.

[27] A.V. Paliwal, N. Adam, and C. Bornhoevd, Adding Semanticsthrough Service Request Expansion and Latent Semantic Index-ing, Proc. IEEE Intl Conf. Services Computing (SCC), July 2007.

[28] A.V. Paliwal, N. Adam, H. Xiong, and C. Bornhoevd, WebService Discovery via Semantic Association Ranking and Hyper-clique Pattern Discovery, Proc. IEEE/WIC/ACM Intl Conf. Web

Intelligence, 2006.[29] A. Sajjanhar, J. Hou, and Y. Zhang, Algorithm for Web Services

Matching, Proc. Asia-Pacific Web Conference (APWeb), pp. 665-670,2004.

[30] G. Salton and C. Buckley, Improving Retrieval Performance byRelevance Feedback, J. Am. Soc. for Information Science, vol. 41,no. 4, pp. 288-297, 1990.

[31] G. Salton, A. Wong, and C.S. Yang, A Vector Space Model forAutomatic Indexing, Comm. ACM, vol. 18, pp. 613-620, Nov.1975.

[32] K. Verma, K. Sivashanmugam, A. Sheth, A. Patil, S. Oundhakar,and J. Miller, METEOR-S WSDI: A Scalable P2P Infrastructure ofRegistries for Semantic Publication and Discovery of WebServices, Information Technology and Management J., vol. 6,pp 17-39, 2005.

[33] http://www.musclenoe.org/research/sci_deliv_pub/D5.1_WP5_

SoA_RevisedVersion_sept05.pdf, 2012.[34] H. Xiong, P. Tan, and V. Kumar, Mining Strong Affinity

Association Patterns in Data Sets with Skewed Support Distribu-tion, Proc. IEEE Third Intl Conf. Data Mining (ICDM), 2003.

Aabhas V. Paliwal received the bachelor ofengineering degree in electronics and telecom-munications from Mumbai University, India, theMS degree in computer engineering degree fromRutgers University, and the PhD degree inmanagement, information technology from Rut-gers University. He is currently a senior technicalconsultant at Mindlance LifeSciences. He is acoholder of a European issued patent and hastwo pending patent applications submitted to the

US Patent and Trademark Office all related to web services. Hisresearch interests include service-oriented architecture, semantic web,semantic web services, and business process management. He is astudent member of the IEEE.

Basit Shafiq received the BS degree in electro-nic engineering from Ghulam Ishaq Khan In-stitute of Engineering Sciences and Technology,Pakistan, the MS degree in electrical andcomputer engineering from Purdue University,and the PhD degree in computer engineeringfrom the School of Electrical and ComputerEngineering at Purdue University. He is currentlya research assistant professor at the Center forInformation Management, Integration and Con-

nectivity (CIMIC), Rutgers University. His research interests include

semantic web, web services, information systems security, and multi-media systems. He is a member of the IEEE.

Jaideep Vaidya received the BE degree incomputer engineering from the University ofMumbai, India, and the MS and PhD degrees incomputer science from Purdue University. He iscurrently an associate professor in the Manage-ment Science and Information Systems Depart-ment at Rutgers University. His researchinterests include data mining, data manage-ment, security, and privacy. He has publishedmore than 60 technical papers in peer-reviewed

journals and conference proceedings, and has receive

Date post:	14-Apr-2018
Category:	Documents
Upload:	prashvirus
View:	228 times
Download:	0 times

Semantics Based Automated Service Discovery

Documents