+ All Categories
Home > Documents > D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft...

D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft...

Date post: 18-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
46
D1.2: Bussines Scenarios v2.0 Mike Brown , David Manzano-Macho, José Ángel Ramos Gargantilla (editor) Universidad Politécnica de Madrid [email protected], {dmanzano,jarg}@fi.upm.es Identifier Deliverable 1.2 Class Deliverable Version 2.0 Version date 07/07/04 Status Final Distribution Public Responsible Partner UPM
Transcript
Page 1: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

D1.2: Bussines Scenarios v2.0

Mike Brown , David Manzano-Macho, José Ángel Ramos Gargantilla (editor) Universidad Politécnica de Madrid

[email protected], {dmanzano,jarg}@fi.upm.es

Identifier Deliverable 1.2

Class Deliverable

Version 2.0

Version date 07/07/04

Status Final

Distribution Public

Responsible Partner UPM

Page 2: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

IST Project IST-2000-29243 OntoWeb

OntoWeb: Ontology-based Information Exchange for Knowledge Management and Electronic Commerce

OntoWeb Consortium

This document is part of a research project funded by the IST Programme of the Commission of the European Communities as project number IST-2000-29243.

Next Web Generation Leopold Franzens University of Innsbruck Insitute of Computer Science Next Web Generation - Research Group Technikerstraße 13 6020 Innsbruck Austria Contact person: Dieter Fensel E-mail: [email protected]

Additional contributors:

- From UPM: Ángel López-Cima, Asunción Gómez-Pérez

- From FTR&D: Alain Leger.

- From OU: Enrico Motta, Arthur Stutt.

- From Ontoprise: York Sure.

Deliverable - 2 -

Page 3: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

IST Project IST-2000-29243 OntoWeb

OntoWeb: Ontology-based Information Exchange for Knowledge Management and Electronic Commerce

Revision Information

Version Date Comments and Actions Status

V0.1 25/11/2001 First draft version edited by co-ordinator Draft proposal

V0.2 12/12/2001 Second draft version edited by co-ordinator Draft proposal

V0.3 19/12/2001 Third draft version edited by co-ordinator Draft proposal

V0.4 09/01/2002 Fourth draft version edited by co-ordinator Draft proposal

V0.5 10/01/2002 Fifth draft version edited by co-ordinator Draft proposal

V0.6 14/01/2002 Sixth draft version edited by WP leader Draft proposal

V1.0 15/012002 Final

V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0

V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise. Update of Natural Language section with new applications and projects (UPM). Re-bibliography by J.A. Ramos

Draft proposal to v2.0

V1.3 25/06/2004 Update of eCRM section by UPM and OU Draft proposal to v2.0

V1.4 29/06/2004 Update of the Information Extraction and Information Retrieval sections by UPM

Draft proposal to v2.0

V1.5 29/06/2004 Update of the E-Commerce section by FTR&D Draft proposal to v2.0

V2.0 30/06/2004 Final

Deliverable - 3 -

Page 4: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

Contents Executive Summary .....................................................................................................................................5 1 Introduction ...............................................................................................................................................6

1.1 Aims and Motivation ........................................................................................................................ 6 1.2 Relationship to other Deliverables.................................................................................................... 6 1.3 Summary of Business Scenarios....................................................................................................... 6

1.3.1 Guide to Business Scenario Description ..................................................................................6 2 Business Scenario......................................................................................................................................8

2.1 List of All Business Scenarios .......................................................................................................... 8 2.2 Knowledge Management / Information Systems / Corporate Internets ............................................ 8 2.3 E-Commerce ................................................................................................................................... 10 2.4 Natural Language Applications ...................................................................................................... 15 2.5 Intelligent Information Integration ................................................................................................. 21 2.6 Information Extraction.................................................................................................................... 24 2.7 Information Retrieval...................................................................................................................... 28 2.8 Semantic Portals (eCRM) ............................................................................................................... 31

3 A Framework for on-going Business Scenario Information Gathering...................................................34 3.1 Business Scenario Template Forms ................................................................................................ 34

3.1.1 General Information about Business Scenarios......................................................................34 3.1.2 Business Case for Business Scenario .....................................................................................34 3.1.3 Main Requirements for Business Scenario.............................................................................36 3.1.4 Examples of Concrete Applications .......................................................................................38

4 Information Sources and References.......................................................................................................40 5 Conclusion...............................................................................................................................................42 6 References ...............................................................................................................................................43

4

Page 5: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

Executive Summary A decade on from the onset of the World Wide Web, its strengths and weaknesses have been clearly demonstrated. There is a generally recognised need to provide a more formal infrastructure for the global exchange of information. As stated in [Frau01];

“Today`s World Wide Web is fundamentally a publishing medium – a place to store and share images and text. Adding semantics will radically change the nature of the Web – from a place where information is merely displayed to one where it is interpreted, exchanged and processed. … In other words, the ultimate aim of the Semantic Web is to give users near omniscience over the vast resources of the Internet, turning the millions of existing database islands into a single gigantic database Pangea.”

The development of ontology technology, while not solely limited to supporting the development of the Semantic Web, nevertheless strongly relates to the above vision. In short, developing more mature ontology technology may help fuel the next generation of knowledge management solutions, moving away from solutions based primarily on document retrieval to content-driven services.

However, any technology paradigm shift must ultimately be supported by commercial success. This deliverable serves to summarise some of the main application types, i.e. business scenarios, to which ontology technology is currently being applied as well as initiating an on-going effort within OntoWeb to continually collect and communicate new business directions based on success stories from industry (the success stories themselves being primarily collated in the related deliverable [OntoWeb2.1]).

This list of business scenarios here is by no means exhaustive. In order to foster this deliverable of information collection a portal has been set up in order to allow members of the ontology community to give detailed accounts of their applications and experiences:

http://babage.dia.fi.upm.es/ontoweb/wp1/OntoRoadMap/index.html Nevertheless, several business scenarios are detailed here. Each business scenario is defined in terms of: a general description, the business case (commercial advantages and risks), the main requirements, an overview of the status (state-of-the-art) and a review of some concrete applications. The business scenarios covered here include:

• Knowledge Management / Information Systems / Corporate Internets • E-commerce • Natural Language Applications • Intelligent Information Integration • Information Retrieval • Semantic Portals (eCRM)

5

Page 6: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

1 Introduction

1.1 Aims and Motivation This deliverable is written in conjunction with Deliverable D1.1[OntoWeb1.1] – the Technical Roadmap. D1.1 provides an overview of the current state-of-the-art of ontology technology and methodologies. This deliverable complements that review by providing an overview of current commercial opportunities for ontology technology.

This report is aims to characterise a number of key business scenarios / application areas for ontology technology. Business scenarios are generalised descriptions of a particular gender of ontology application. Hence, the description of a business scenario serves a number of different purposes:

1. To provide commercial arguments to developers and managers interested in implementing an ontology-based solution of a particular type. What are the main advantages, issues to be considered during development and deployment and main potential pitfalls to be avoided.

2. To provide some practical focus to research in development relating to ontologies by highlighting the main current commercial requirements and potential.

3. To encourage industry and commerce to take-up ontology technology by emphasizing the commercial opportunities it can enable and clearly defining the business cases.

4. In conjunction with Deliverable D2.1 [OntoWeb2.1], to enable guidelines for best-practise deployment of ontology technology to be determined and communicated to the community.

This report should also serve as a means of assimilating information about the diversity of commercial ontology projects currently in development or already deployed. Each Business Scenario should include a section detailing briefly a number of actual projects that fall under the classification of the Business Sector. As such, some interesting trends and commonalities can be identified and will be summarised in the conclusion.

This deliverable has two versions. The first release aimed to reflect emerging trends in industry and to allow lessons learnt from new applications to be rapidly disseminated. As such, one of the main goals for that version of the deliverable was to establish an associated web portal by which members of community can enter information concerning applications and business scenarios. The basis for this portal is described in section "A Framework for on going Business Scenario Information Gathering". The second version updates the first one, aiming to included new applications.

1.2 Relationship to other Deliverables The relationship between deliverables D1.2 and D2.1 is as follows:

• Common Web Portal – Is used to gather (anecdotal) information from the community useful for both deliverables

• Deliverable D1.2 (Business Scenarios) distils from this information and other sources the reasons Why and under What Conditions a commercially viable ontology application can be deployed. Deliverable D1.2 will also offer some predictions as to future commercial trends.

Deliverable D2.1 (Best Practices and Guidelines) goes further in attempting to provide a concrete set of guidelines as to what needs to be achieved in order to deploy ontology solutions, i.e. this deliverable describes the How to construct as successful ontology based application.

1.3 Summary of Business Scenarios This section briefly describes the way in which business scenarios will be presented in this deliverable.

1.3.1 Guide to Business Scenario Description Each Business Scenario is characterised with respect to a number of aspects, represented as separate sub-sections:

6

Page 7: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

• General Information – A short description of the Business Scenario and some indication as to the most relevant business sectors that it covers

• Business Case – What commercial advantages does the Business Scenario embody and what are the major risk factors that this advantage is off-set against

• Main Requirements – what technological, information and personnel demands, amongst others, are involved in successfully implementing the Business Scenario.

• Status – what is the state-of-the-art and predicted trends with respect to this scenario • Concrete Applications – A number of descriptions of industrial ontology applications that conform

to the Business Scenario • Guidelines – actually dealt with in deliverable D2.1 [OntoWeb2.1] • Related Scenarios – What are the overlaps with other business scenarios defined here.

Note the above structure does not correspond 1-to-1 with that outlined in section “A Framework for on-going Business Scenario Information Gatheri” for the ongoing, internet-based collection of information concerning business scenarios and applications. The forms defined in that section give a more detailed breakdown of the information required concerning each application / scenario. In the following section, following the above described structure, an attempt is made to distil out the general information that holds for each business scenario.

7

Page 8: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

2 Business Scenario This section provides a list of all concrete Business Scenarios

2.1 List of All Business Scenarios The following is a list of the Business Scenarios so far identified as being salient (this list is by no means exhaustive and is expected to evolve over time as more information over actual applications and business scenarios emerges – see section "A Framework for on going Business Scenario Information Gathering". A Business Scenario reflects a commercial need – i.e. why one builds a particular type of application and what are the mayor practical issues to be considered when deploying the ontology application. Each Business Scenario will be briefly summarised in the following sections

• Knowledge Management / Information Systems / Corporate Internets • E-Commerce • Natural Language Applications • Intelligent Information Integration • Information Retrieval • Semantic Portals (eCRM)

2.2 Knowledge Management / Information Systems / Corporate Internets General Description: Knowledge management has been a ubiquitous term within the IT industry over the past decade. Pinning the term “Knowledge Management” down to a single meaning is difficult. It is as much a term born out of the general realisation that corporations must fully exploit their knowledge resources to remain competitive as it is a technological term. Knowledge Management covers issues such as:

• Restructuring informal (textual) information into more formal representations and databases and discovering knowledge from previously untapped information sources (data mining and knowledge discovery)

• Integrating information from different sources within a corporation in order to create a centralised knowledge repository

• Providing sophisticated classification and indexing capabilities so that information contained within document collection

• Improving dissemination of knowledge from key personnel within a corporation and genrally making communication channels more efficient and effective.

In short, knowledge management is concerned with changing the infrastructure of corporations so that the employees of that individual employees have more relevant information available to them at the right place, at the right time. This is not just an issue of providing the right IT systems but must also covers many organisational and political issues [Malhorta, 2001] in order to ensure a company makes full commercial advantage of its knowledge resources.

The link between ontologies and knowledge management is clear given that a major aspect of knowledge management involves the integration of information from disparate sources. Ontologies can play a minor role, such as attempting to boost the information retrieval functionality of a document management system (see also the business scenario “Information Retrieval”) or a central corporate ontology can be envisaged as the core basis for the most sophisticated knowledge management solutions.

Business Case: Business arguments for the construction of such corporate information systems, supported by ontologies, include:

• General Competitive Advantage - most analysts currently recognise that timely access to business content by employees is a key to competitive advantage in many sectors. This can be broken down to reflect:

8

Page 9: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

o Better Information Cross-Referencing - Collating information from multiple sources and media, leading to more effective cross-referencing e.g. [Profium2]. Informations sources can be structures (e.g. databases), unstructured (e.g. MS Word documents) or Semi-Structured (e.g. competitor web sites) [Voquette1].

o Centralised Information Broker for Differing End Users- Making information available in multiple formats and media, allowing centralised knowledge to be deployed for multiple purposes, e.g. [Profium2].

o Managing Information Overload - Providing automated ways of dealing with the explosion of available information, e.g. [CognIT, Profium, Vata01, Voquette1]. Information overload must be avoid both at a corporate level (a corporation must be able to centrally manage all of its valuable information) and for individual users (the search for relevant information should not be prohibitive).

o Improved Intra-Corporation Communication - Making relevant information available throughout a corporation, improving information flow across organisational boundaries, e.g. [CognIT].

o Improved Intra-Corporation Efficiency -Improved corporate efficiency through easy access to information and knowledge, particularly relating to employee skills, e.g. [CognIT]. This may also be measurable in terms of, for example, reduced project life-cycles.

o Improved Business Processes - Centralising best practises and standardising procedures across a corporation, e.g. [CognIT].

o Standards - Enabling compliance with standards, such as ISO 9000, e.g. [CognIT]. o Maintain Business Intelligence – Another possible aspect of knowledge management is to

construct and maintain efficient knowledge resources concerning the competitor companies and their statuses for a given company. This point is elaborated in the business scenario “Semantic Portals (eCRM)”.

Some of the general risks concerned with knowledge management applications, both technical and non-technical are listed below:

• Lack of Organisational Commitment – Deploying an effective knowledge management solution is usually not just an issue of installing some new piece of IT but requires a commitment to organisational and political change within a corporation. For this reason, there is often resistance to deployment of large-scale knowledge management solutions.

• Lack of Transparent RoI – The end effect of a well-deployed knowledge managements solution is essentially better communication of information between the employees of a corporation leading to advantages such as improved overall efficiency as listed above. Unfortunately, such advantages are difficult to ground in terms of monetary advantage for a corporation, hence some management resistance to knowledge management projects may exist.

• Difficulty in Knowledge Acquisition – Formalising the central knowledge of a company, e.g. its best practises, in order to establish a corporate ontology is a non-trivial modelling task!

Main Requirements: In order to successfully deploy an ontology-based Knowledge Management solution a wide range of technical and organisational requirements must be met. These include:

• Choice of a sufficient (ontology) language for defining meta-data. Desirable aspects of such a language include:

o The language should be a standard, ensuring wide-spread support and compatibility. o The language should support construction of taxonomies above information resources such as

document archives. o The language should support different views of information to support different end user needs.

• There should be sufficient technology support to ensure the easy of navigation/retrieval of information by the eventual end users. Requirements include:

o Ability to define different profiles/ views for different end users [Vata01]. o Navigation support should be “intuitive” i.e. suitable for non-experts.

9

Page 10: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

• In order to counter the risk in terms of management resistance to the deployment of large-scale knowledge management technology it is important that metrics are developed in order to more quantitatively define the benefit and cost of knowledge management solution deployment.

• Methodologies must be developed and followed in order to allow for the systematic deployment of knowledge management solutions.

Status: Given the wide scope of knowledge management, a detailed review of the state-of-the-art would make little sense here, instead a number of useful references will be given. Paradoxically, for those seeking a more a detailed view of the current state-of-the-art in knowledge management, a dauntingly large number of sources of information is available. Those considered of most interest are listed here.

The Brint Institute [Brint] provides a portal that has many pointers to useful business-oriented material. Similarly the Knowledge Management Resource Centre [IKM1] provides a wide range of information, from books to industry case studies. Finally the Knowledge Management Forum is a web site that not only provides many down loadable papers but also supports a wide range of discussion threads relating to specific aspects of knowledge management [KM-Forum]. Concrete Applications: CognIT´s CORPORUM technology has been deployed successfully in a number of industrial settings (see “Success Stories” under [CognIT]). Amongst these was a project to provide knowledge management for aluminium metal products for HAMP. Interestingly one of the key factors in winning the tender bid for that project was that CognIT´s technology was perceived as an off–the-shelf standard solution that could be readily deployed without high integration costs. A similar system has also been deployed at Statoil.

Additionally, Deliverable D1.1 [OntoWeb1.1] provides some descriptions of related applications including:

• CoMMA: Corporate Memory Management through Agents:

(http://www.si.fr.atosorigin.com/sophia/comma/Htm/HomePage.htm)

• Marchmont Observatory Semantic Search Service (http://kmi.open.ac.uk/observatory/)

• MGT (http://kronsteen.open.ac.uk/mgt/)

• MyPlanet (http://eldora.open.ac.uk/my-planet/)

• PatMan (http://kmi.open.ac.uk/projects/patman/)

• PlanetOnto (http://kmi.open.ac.uk/projects/planetonto/)

Guidelines: Published in [OntoWeb2.1]

Related Business Scenarios:

Knowledge Management has a strong similarity / overlap with

• Intelligent Information Integration

Knowledge Management may incorporate aspects of:

• Information Extraction • Information Retrieval • Semantic Portals (eCRM)

2.3 E-Commerce General Description: Electronic Commerce is based on the exchange of information between involved stakeholders using a telecommunication infrastructure. There are two main scenarios: Business-to-Customer (B2C) and Business-to-Business (B2B).

10

Page 11: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

B2C applications enable service providers to propagate their offers, and customers to find offers which match their demands. By providing a single access to a large collection of frequently updated offers and customers, an electronic marketplace can match the demand and supply processes within a commercial mediation environment.

B2B applications have a long history of using electronic messaging to exchange information relating to services previously agreed between two or more businesses. Early plain-text telex communication systems were followed by electronic data interchange (EDI) systems based on terse, highly codified, well structured, messages. Recent developments have been based on the use of less highly codified messages that have been structured using the eXtensible Markup Language (XML).

A new generation of B2B systems is being developed under the ebXML (electronic business in XML) label. These will use classification schemes to identify the context in which messages have been, or should be, exchanged. They will also introduce new techniques for the formal recording of business processes, and for the linking of business processes through the exchange of well structured business messages. ebXML will also develop techniques that will allow businesses to identify new suppliers through the use of registries that allow users to identify which services a supplier can offer.

Business Case: Business arguments for the construction of such E-Commerce systems, supported by ontologies, include the following considerations.

General Competitive Advantage - Electronic commerce technologies have the potential to lead to significant productivity gains within individual organisations. When applied to value chain relationships, the use of electronic commerce technologies not only leads to rationalisation of business processes and cost savings, but it can also lead to competitive advantages in information / knowledge and process. This can be broken down to reflect: B2C - Personalising customer interactions using Knowledge mined from behavioural data -

Most businesses today interact with their customers through a number of channels. The true Return on Investment (ROI) associated with the use of these digital channels will only be realised when businesses contextualise these channels to make self-service a viable option for their customers. Businesses that provide their customers with the right information at the right time and in the right context will win customer loyalty and hence gain profitability. It is believed that a key component in achieving success is Knowledge mining and the real time deployment of the resulting knowledge [Anand et al., 2001].

B2C - Electronic Payment Systems: Issues of User Acceptance - Electronic payment systems are an essential part of electronic commerce and electronic business and are greatly important for their future development. Characteristics of primary importance are: applicability, traceability, trust, security, convertibility, ease of use and reliability. Lower level of importance was attributed to anonymity and efficiency.

B2C -The right products or right services to the right place - In other words, delivering the right content and services with the right portal and the right devices -- whether wired or wireless -- combined with a high level of security, the right interface and network and integrated with the user’s vital business processes.

B2C - Intelligent Dialogue Interface - one example is Virtual sales assistant systems with natural-language text dialogues are a broad field of research. Most systems focus on the efficiency of the assistant system generating the dialogue and have limited capabilities for the integration into existing business systems.

B2B and B2C - Customer Care Systems E-CRM - Advanced call centers have become an important tool for businesses for managing both customer and business-to-business relations. The old architecture of the call center, made by PABX, ACD, IVR systems and phone operators can be maintained but should be enriched by new technologies that allow a better interaction between customer and supplier. Meanwhile or after the purchase (after sale support), the customer could require a help desk support or a consultant service. Traditional call centers, however, can often become quickly overwhelmed by heavy customer demands, can be costly due to the necessary high-availability of human operators, and tend to offer very generic impersonalized services. These facts, together with the rapid growth of web-based electronic commerce and the rapid growth of the Internet in general, have made the extension of traditional call center services with the integration of web-based customer support very attractive.

11

Page 12: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

B2C and B2B - Intelligent mediation process - One example is the marketplace viewed as a knowledge intensive transactions place on which Knowledge extractive can be put to the benefits of both customers and providers.

B2B - Products Catalogue management tools - A fundamental premise - and a major economic driver - behind B2B e_commerce is that we can replace labor-intensive and time-consuming human interactions with semi-automated process. Also worth mentioning is that is heavily-based on Ontology driven process.

B2B - Products co-development tools - Competitive advantages of this industry depend on the creativity of the designers and the networked organisation of co-operation of the companies spread all over Europe. Such a situation must lead to the acceleration of the design process while informational structure as well as the flow of business processes should contribute to the emergence of eCommerce.

B2B - Intelligent supply chain - B2B - Standards - Enabling compliance with standards, such as e.g. [ebXML]

Main Requirements:

At the present time, ontology and more generally ontology-based systems, appear as a central issue for the development of efficient and profitable Internet commerce solutions. They represent a way to access with efficiency and optimization to a large scale of Internet information (professional, business, leisure, etc.) spaces, which will be more and more prominent and determining feature of most business, governmental and personal informational activity in the near future. However, because of an actual lack of standardization for business models, processes, and knowledge architectures, it is today difficult for companies to achieve the promised return on investment (ROI) from the e-commerce.

Moreover, it also exists a technical barrier that delay the emergence of e-commerce, lying in the need for applications to meaningfully share information, taking into account the lack of reliability and security of the Internet. This fact may be explained by the variety of enterprise and e-commerce systems deployed by businesses and the way these systems are variously configured and used. As an important remark, such interoperability problems become particularly acute when a large number of trading partners attempt to agree and define the standards for interoperation, which is precisely a main condition for maximizing the Return on Investment (ROI).

Although it is useful to strive for the adoption of a single common domain-specific standard for content and transactions, such a task is still often difficult to achieve, particularly in cross-industry initiatives, where companies co-operate and compete with one another.

In addition to this:

• Commercial practices may vary in a wide way and consequently, cannot always be aligned for a variety of technical, practical, organizational and political reasons.

• The complexity of the global description of the organizations themselves: their products and/or services (independently or in combination), and the interactions between them remain a formidable task.

• It is usually very difficult to establish, a priori, rules (technical or procedural) governing participation in an electronic marketplace.

• Adoption of a single common standard may limit business models, which could be adopted by trading partners, and then, potentially reduce their ability to fully participate in Internet commerce.

Because of all these aforementioned reasons, ontologies appear as really promising for e-commerce. Indeed, alternative strategies may consist of sharing foundational ontologies, which could be used as the basis for interoperation among trading partners in electronic markets. An ontology based approach has the potential to significantly accelerate the penetration of electronic commerce within vertical industry sectors, by enabling interoperability at the business level, reducing the need for standardisation at the technical level. This will enable services to adapt to the rapidly changing online environment.

The following uses for ontologies, and classification schemes that could be defined using ontologies, have been noted within electronic commerce applications: • Categorization of products within catalogues

12

Page 13: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

• Categorization of services (including web services) • Production of yellow page classifications of companies providing services • Identification of countries, regions and currencies • Identification of organizations, persons and legal entities • Identification of unique products and saleable packages of products • Identification of transport containers, their type, location, routes and contents • Classification of industrial output statistics

Many existing B2B applications rely on the use of coded references to classification schemes to reduce the amount of data that needs to be transmitted between business partners. Such references overcome the problems introduced by the natural ambiguity of words that have more than one meaning (polysems) or can apply to more than one object (e.g. personal names such as John Smith). By providing a separate code for each different use of the term it is possible to disambiguate messages to a level where they can be handled without human intervention.

Very few of the existing classification schemes used within electronic commerce applications have been defined as formal ontologies, or have been formally modeled to ensure that the relationships between terms are fully described. To date most of the techniques introduced by ontologies have been applied to general linguistic situations, such as those involved in specific academic disciplines, rather than to the language adopted by specific industries.

Status:

According to a recent United Nation publication (see Ref) growth in e-commerce has continued uninterrupted through the small recession of the year 2000 and 2001. The Internet traffic double each year in the current period and amount to 180 petabytes (2002) up to 5 175 petabytes (2007) and 60% is expected to originate from consumers and 40% from business activities according to IDC 2003.

Forecasts of the value of global e-commerce in 2003 range between $ 1 408 billion and $ 3 878 billion, with growth projections that in the most optimistic scenario put the global volume of e-commerce at $ 12 837 billion by 2006 (Forrester Research and eMarketer). The share of B2B transactions over B2C should be as high as 95%!

The most popular services are online sales and especially software, cultural events, books and travel, and this especially in Western countries.

In Japan, according to National Statistics Bureau data for 2001, 10.5 per cent of all enterprises were engaged in e-commerce (B2C and/or B2B) through either the Internet or other networks. The sectors most advanced in the adoption of e-commerce were banks and trust banks, information services and research, retail trade of general merchandise, retail trade of motor vehicles and bicycles and wholesale trade of general merchandise. The number of people buying online was estimated at 20 million in 2001 (Visa International Service Association 2002).

The dominance of B2B transactions is acted for the more advaced countries. The total trade is expected to grow vigourously, particularly as the integration of the Internet-based purchasing systems with companies' back-end systems progresses.

The technology trends that very likely affect this expected growth are:

• The broadband access (DSL and Cable): in addition to spending more time online, broadband subscribers are more likely to engage in e-commerce and generally have more positive experiences and attitude regarding online consumption. Furthermore information society services (e-health, e-education, e-government) should benefit strongly on a better productivity of an economy, broadband availability will encourage innovation and stimulate economic growth.

• Security issue and so the level of trust in the mediated e-commerce system, is still a strong impediment to full adoption of e-Commerce. This remains a key issue to be resolved.

• Web services is also a main factor of e-commerce growth and not only for e.commerce. However as any emerging and promising technology, many scientific hurdles are still on the research agenda before full deployment of truly Semantic Web Services.

13

Page 14: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

• Supply Chain Management (SCP) Supply chain management is primarily concerned with the efficient integration of suppliers, factories, warehouse and stores so that merchandise is produced and distributed in the right quantities, to the right locations and at the right time, and so as to minimize total system cost subject to satifying revice requirements.

Figure 1: The Business eventually accepts that the Internet can bring long-term value

Concrete Applications:

The list is very vast in general E-Business!!. In D2.x deliverables series selection of the most representative ones of the advanced KB oriented e.commerce systems is proposed. Here are some key illustrative examples.

The following related applications are detailed in [OntowebWP2.4]: • CHEMDEX (http://www.chemdex.com) • MKBEEM (http://www.mkbeem.com/) • SMART-EC (http://www.telecom.ntua.gr/smartec/) • ALICE (http://kmi.open.ac.uk/projects/alice/)

Guidelines: Published in [OntoWeb2.41]

Related Business Scenarios:

The E-Commerce business scenario has a strong similarity / overlap with

• Intelligent Information Integration

• Information Retrieval • Semantic Web Services

E-Commerce business scenarios may incorporate aspects of:

• Information Extraction

14

Page 15: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

• Natural Language Applications • Semantic Portals (eCRM)

2.4 Natural Language Applications General Description: Natural Language (NL) applications allow human beings to communicate with a computer-based information system via (typically constrained) natural language – either via text-driven or voice channels. One of the central problems with NL applications is that it is difficult to control the terminology and concepts expressed by an untrained end user, hence an NL system that has no general domain knowledge is typically very fragile in practical use –i.e. prone to misunderstanding the user. One of the major roles of ontologies in NL applications is therefore the classical role of providing a means of providing consensus between two communicating agents. More concretely, the ontology provides a means of mapping the terms uttered by a human agent onto the concrete data items supported by the computer-based information system.

Ontologies can also be used for the communication from machine to human – i.e. for natural language generation purposes. Here the problem, from the perspective of the machine, is not so much one of avoiding misunderstandings, as it can be expected that the computer system has control over the information it wishes to present to the user and, at least for a crude solution, the formation of a valid NL utterance can be created using the look-up of hard-coded phrases (such an approach is used in most video games for example). However, ontologies may have a role to play in terms of creating more elegant utterances. For example, an ontology may allow the system to abstract away from listing a set of concrete alternatives (e.g. product instances) to a more general concept that subsumes all such alternatives. Hence an utterance such as “Are you interested in buying a Porsche, Ferrari, Jaguar or TVR?” Might be converted to “Are you interested in buying a sports car?”

In general, successful natural language communication is not just a matter of exchange words and knowing what the words mean. Other more subtle aspects of communication, such as recognising user´s intentions and desires or even responding to the emotions of a user, are often important. For example if a user expresses anger, it may be important for the system to distinguish between a user that is frustrated with the performance of the system itself as opposed to a user who, for example, is not satisfied with the product offers the system has found. Moving towards communication systems which support this level of user-sensitivity also may require an underlying ontology basis, for example to give a formal grounding to emotional statement – see [Dabiri et al., 2002] for an example of such a system.

The reciprocal relationship between NL and ontology technologies also exists, i.e. not only are ontologies useful to support NL applications but Natural Language Processing techniques can be deployed to provide a means for discovering ontological concepts in textual documents (e.g. [Invent1], [OntoWeb D1.5]) – i.e. NLP is used to automate the construction of ontologies in support of an end application that is not necessarily concerned with natural language. To build semi-automatically an ontology is the aim of the ontology learning [Maedche and Staab, 2003] field, and a detailed description of NL systems for learning ontologies was made in the Deliverable 1.5 [Ontoweb D1.5]. However, such applications of NLP technologies to ontology building are also covered in other business scenarios, such as “Information Extraction”.

Business Case: From the above, there is a strong argument that whenever an NL system is to be developed some ontological knowledge must also be developed in order to provide some robustness in the deployed system. The business case for NL applications can be summarised with respect to the following points:

• Greater NL Reliability – Using an ontology to model and structure the application domain for an NL application allows, for example, better structured dialogues and is essentially for eliminating or interpreting misunderstandings correctly. This in turn effects the efficiency and acceptability of the deployed system (see below).

• Direct Cost Cutting through Manpower Reduction – Particularly for help-desk and call centre types of application, NL plus ontologies may be able to take over some of the more straightforward tasks currently carried out by human operators. This can include the collection of basic information from a caller before forwarding to a human telephonist within a call centre or to provide systems that support, for example, automated email reading, filtering, routing and response.

• More Efficient Communication – This breaks down into a number of sub-points:

15

Page 16: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

o Unconstrained user input - Allowing users to enter request for information via natural language arguably reduces the communication overhead on the part of the user as the use is not constrained to selecting the right keyword for a search engine or to navigating some nested menu structure.

o More friendly presentation and selection of information – Verbal or natural language text receipt as well as entry of information may often be the simplest and most acceptable form of communication to a human being. For example, in [Chai et al., 2001a], a n NL based system is evaluated favourably with respect to the equivalent menu-driven system.

o Dialogue-based negotiation - Through NL it is possible for an information system to have an interactive dialogue with a user. In this way, the computer-based system itself takes some of the responsibility for guiding a user towards the information of interest to them (e.g. [Chai et al., 2001a and 2001b]. For example, an NL interface to a product portal may interact with a user by selectively asking pertinent questions concerning the users requirements prior to making a specific search for relevant products. Depending on the user´s response and what sub-population of products currently match a user´s requirements the system may decide to either ask some question concerning a differentiating factor of the remaining products or suggest possible candidate products to the user.

• Better Customer Acceptance – As an extension to the above, NL provides not only a more efficient and helpful means of allowing an end user to communicate with a computer-based information system, it also has a prestige factor – giving the user the impression of communicating with a sophisticated and intelligent system. These factors combine to provide generally high usability which translates into better customer acceptance. For example, an NL front-end to a product portal may increase the average time a user spends browsing the portal (“stickiness”) and hence increase the percentage of sales made.

o Generating Feedback - Better customer acceptance can also be achieved where NL generation is used to provide explanatory feedback to a user. Such an approach has been applied in the past to make more transparent the reasoning performed by expert system. However, this argument covers also such simple applications as generating an email response to confirm to a user that a particular transaction, such as an on-line purchase, has been carried out.

• More Effective Information Gathering, Market Analysis – NL interfaces encourage a user to specify freely there requirements rather than being constrained by the options presented by the system. This can have the side-effect that requirements currently not supported but desired by the population of end users may be identified and used, for example, as input to guide future marketing strategy of a company providing an online server.

• Enablement of Voice – The ability to speak and listen rather than read and write may lead to a gradual revolution in the way in which humans interact with a machine. For example, even with today’s primitive dictation technology it may be much more efficient to dictate and then manually correct a long report rather than typing the whole report by hand

o Support for Mobile Information Access – Perhaps the strongest business case for NL applications is in relation to voice driven systems and the new “mobile” applications that successful voice technology may enable. Such systems open up a new communication channel between end users and a computer-based information system, namely via the telephone. This in turn opens up new potential business cases, particularly relating to mobile end users who are excluded other channels of communication, such as the internet. Indeed in some scenarios voice may be the only possible medium for communication with a computer – for example it is no coincidence that many of the leading car manufacturers are investing heavily in voice technology in the hopes of enabling new cock-pit services in the future. In short, voice enablement of information resources (so called voice portals) is key to the vision of “Any Place, Any Time” [Voquette] information access.

• Support for Multilinguality - In particular, machine translation technology may be the most natural way of allowing humans of different nationality to communicate with a single central knowledge repository and/or to communicate with each other via a central machine-based mediator. This has obvious massive potential for commercial applications given today’s global market places.

The difficulties in applying NL applications successfully should not be underestimated. Some of the major pitfalls to consider include:

16

Page 17: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

• Over Attributing of Intelligence – The more a computer system acts like a human being the more a naïve end user will assume that the computer system has equivalent intelligence to a human being. In this sense, a user that is initially impressed by a machines ability to simulate natural language capabilities may feel cheated when they discover that the underlying capabilities are little different to a normal information system. The user irritation can be extenuated when the NL system is brittle – i.e. highly sensitive to the terminology used by an end user.

• Technology Weaknesses – Much NL technology is still very innovative and has yet to reach “tried and tested” level of maturity. As such, NL applications carry with them a high technological risk factor. This point is extenuated for voice-driven systems where reliable and highly accurate speech recognition remains the most difficult and critical part of the overall system

• Potential for “Open Scope” Interaction – Care must be taken to ensure that the range of utterances or textual prompts that an end user can deliver is constrained. This is particularly true of systems that allow to be initiative – i.e. the user is not just answering questions but can volunteer information. For example, it maybe relatively easy to design an NL interface to a flight booking system so that a user is compelled only to talk about destination places, time etc. However, if the system is extended to allow users to search for a holiday of their preference then a significant jump in complexity occurs as it is much more difficult to constrain the types of preference different types of people have for a holiday destination. The consequence of having a wide open scope is much more modelling effort to cover all possible end-user interactions and a possible increase in the number of potential misunderstandings.

In general, NLP faces the following challenges (among others) [Thomas 2000] • Physical limitations: The greatest challenge to NLP is representing a sentence or group of concepts

with absolute precision. For this purpose, ontologies play a key role, however most of the NLP approaches present the lack of a unifying ontology that addresses semantic as well as syntactic representation. Other important problem is the wide amount of data necessary to perform NLP at the human level requiring a high processing capacity and memory space.

• No unifying semantic repository: NLP lacks an accessible and complete knowledge base that describes the world in the detail necessary for practical use.

Main Requirements: The following is a list of general requirements that may need to be considered when deploying a successful NLP system:

• Sufficient available data. Building a robust NL system requires sufficient training data to guide the development. The following types of information are typically required, either pre-existing or to be constructed:

o For dialogue based applications, transcripts and recordings of actual dialogues that can be used to I) train speech recognition, II) guide grammar development and III) provide a set of test benchmarks. For NL enabled data access or FAQ systems, collection of question answer pairs may be sufficient training data.

o Domain corpora; i.e. large textual resources, sometimes manually annotated, that can be used as the basis for automatic learning of some of the NL resources, such as grammars.

o Domain Lexicon; i.e. sets of synonyms and other information for all key concepts within the scope of the dialogue,

o Sufficient material to construct an ontology tailored to the domain of the NL application and the purpose of the application.

• The required NLP technology is available. There are some application-independent (reusable) components, specially in understanding, such as Part-Of-Speech taggers and Morphological Analysis modules, but they are available only for some language(s), (e.g. English, Spanish and German is well supported). There are also multilingual components like a linguistic realiser, called KPML [Bat, 1995], in the generation field.

• The complexity of the speech application must be strongly controlled:

o The scope of the interactions can be well defined. NL applications are most applicable where the domain of discourse can be tightly constrained (e.g. to booking a travel ticket, inquiring about stock information, searching for a product within a specialised catalogue, etc.)

17

Page 18: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

o Good coverage must be ensured in terms of both terminology and domain concepts underlying this terminology

o For voice applications, in order to ensure good recognition rates, particularly for “any speaker applications”, the range of possible utterances from the human user at any point in the dialogue may need to be limited (the more words the system attempts to recognise simultaneously the lower the greater the chance for a misrecognition).

Status: First generation, voice driven NL systems typically function in a machine-driven way – by the user responding to specific, often multiple-choice questions and/or using the telephone key-pad to select appropriate responses. Such systems, while robust, are typically cumbersome. Moreover, for such machine-initiated applications, the need for ontologies to support the dialogue system is minimal as it is possible to strongly constrain the range of responses a human can reasonably give.

The next generation of voice dialogue systems has moved towards user-initiated dialogues where users can offer information freely, often specifying multiple requirements in a single utterance, and the NL systems must be flexible enough to respond appropriately and opportunistically. In such applications, the scope for the user utterances is wider and less well defined and hence a need for ontology support is generated.

The application of machine translation technology is still relatively limited and is an unsolved technical problem in the most general case [Vermobil1]. However useful applications are possible within limited scopes.

NLP has until now been typically deployed as one-off custom-built applications. Some initiatives are currently underway to move towards more re-useable platforms and methodologies for building NL applications. GATE [Gate1,Cunn01] from the university of Sheffield is perhaps leading the way in this respect. GATE provides an architecture, recently ported to Java, that allows a component based approach to constructing NL and information extraction applications. A framework and user-friendly support tools are provided and can be downloaded from [Gate1].

There are also a close relation between NL systems and the ontology technology in order to reach tools that allow to build up automatically domain ontologies, reducing the time and effort required in that process. Better NL annotation and understanding systems will help to the ontology generation, reducing the user intervention and improving the realization of the semantic web applications.

Concrete Applications: There are a number of companies currently specialising in bringing NL technology to the market.

SemanticEdge [SemEdge1] is one such company providing voice-driven dialogue applications in support of various transaction-base applications. For example, the company has worked together with a number of travel companies to provide applications such as automated ticket booking services. In order to support such applications, SemanticEdge developed from scratch a “geographic ontology” defining travel locations and related concepts. This is a medium size ontology with most of the information being modelled at the instance level (roughly 20,000 concept instances are included to date in the ontology). A large proportion of the ontology was created semi-automatically by applying information extraction techniques to a number of freely available internet resources in order to mine out the required concepts and relationships. The chosen language and technology was F-Logic supported by Ontoprise´s Ontobroker toolkit. The reason for this decision was based primarily on the strength of Ontoprise´s inferencing engine that provides a powerful basis for reasoning about instance data and also that Ontobroker provides some support for maintaining large scale ontologies as modular projects (an ontology can be broken down systematically into different flat files which are loaded at run-time as required). The main purpose of the ontology was to provide a mechanism for mapping ambiguous statements of the user e.g. “A family holiday in the Mediterranean” onto more concrete concepts such as stored in a databases (e.g. Location = Greece, Spain or Turkey,… HotelFacility include Kindergarten, Paddling-Pool, …). One of the problems encountered was the need to deploy the ontology as a highly stable and performant run-time resource (solved by compiling out the ontology into a database).

18

Page 19: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

IBM have recently deployed a state-of-the-art natural language dialogue system to support the on-line sales of their portable lap-tops [Chai et al., 2001a and 2001b]. This system is archetypal in terms of the way it uses knowledge to support the dialogue system. A central knowledge repository captures a number of different types of knowledge including:

• Default values to interpret incomplete user specifications • “Interpretative rules” that allow a mapping of ambiguous concepts and requirements onto concrete

products and product features. These rules incorporate some simple taxonomic relationships as well as associating requirements from a user perspective onto technical features of specific models of lap-top computers.

• Business rules, such as a default priority for a particular product reflecting a current sales strategy of the company hosting the e-commerce portal.

• Mappings from the concepts of this knowledge base onto concrete terms (words) via the domain lexicon

Until now this knowledge base can be thought of as only a relatively crude ontology. The chosen representation language (XML) has no formal semantics and no indication is given in [Chai et al., 2001a and 2001b] that any form of complex reasoning is used to operate at run-time on this knowledge. This may be a reflection of the relative narrowness of the product domain. However, one interesting comment from [Chai et al., 2001a] is that the manual creation and maintenance of such knowledge resources to support NL systems is undesirable and there is a need for tools to support automated knowledge creation.

No review of recent NLP applications would be complete without a reference to Verbmobil [Verbmobile1], a massive (circa 160 million DM) collaborative project that has recently ended having been initiated in 1993. The goal application of the project was to allow real-time conference calls between a German and Japanese business person, for example to arrange the time and location of a joint meeting, with all translations between the two people being performed automatically using English as an intermediary language. In terms of providing an impetus to German research community for the furthering of language technologies and as a funding to generate NLP resources and publications, Verbmobil was an undoubted success.

Language & Computing [LandC1] provide a range of tools for performing natural language understanding, information extraction and information retrieval within the medical domain based on a very large scale terminological ontology. More details of an example application can be found under the business scenario “Information Retrieval”

About multilinguality projects, the most important one is the EuroWordNet project [EWN001]. EuroWordNet is a multilingual database with wordnets (set of synonymous words) for several European languages (Dutch, Italian, Spanish, German, French, Czech and Estonian). Each wordnet represents a unique language-internal system of lexicalizations. In addition, the wordnets are linked to an Inter-Lingual-Index. Via this index, the languages are interconnected so that it is possible to go from the words in one language to similar words in any other language, and also gives access to a shared top-ontology. The top-ontology provides a common semantic framework for all the languages, while language specific properties are maintained in the individual wordnets. The database can be used, among others, for monolingual and cross-lingual information retrieval, which was demonstrated by the users in the project. The cooperative framework of EuroWordNet is continued through the Global WordNet Association (http://www.globalwordnet.org/). This is a free and public association that builds on EuroWordNet and Princeton WordNet. The aim is to stimulate further building of wordnets, further standardization and interlinking and the development of tools, dissemination of information.

The following related applications are detailed in Deliverable D1.1 [OntoWeb1.1]: Natural Language Understanding

• AlFresco (http://ecate.itc.it:1024/projects/alfresco.html)

• ITEM search engine (http://terral.ieec.uned.es/cli, http://sensei.ieec.uned.es/item/principal.htm)

• OntoTerm (http://www.ontoterm.com/)

19

Page 20: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

Natural Language Translation • ALT-J/E (http://www.kecl.ntt.co.jp/icl/mtg/topics/mtg-index.html)

• EuroWordNet (http://www.illc.uva.nl/EuroWordNet/)

• GAZELLE (http://www.isi.edu/natural-language/projects/GAZELLE.html)

• Mikrokosmos (http://crl.nmsu.edu/mikro)

• PANGLOSS (http://www.lti.cs.cmu.edu/Research/Pangloss/)

• Ontogeneration (http://delicias.dia.fi.upm.es/../proyectos/terminados/ontogeneration/ontogeneration_proyecto_Esp.html)

• Penman (http://www.isi.edu/natural-language/penman/penman.html)

• TechDoc (No URL available)

Applications that use NL for ontology learning are described in Deliverable 1.5 [OntoWeb1.5]

• ASIUM. (http://www.lri.fr/%7Efaure/Demonstration.UK/Presentation_Demo.html)

• LTG Text Processing Workbench (http://www.ltg.ed.ac.uk/%7Emikheev/workbench.html)

• Promethee (http://www.sciences.univ-nantes.fr/info/perso/permanents/morin/promethee)

• OntoLearn (http://www.dsi.uniroma1.it/~velardi/IEEE_C.pdf)

• SOAT (http://www.iis.sinica.edu.tw/IASL/en/index.htm)

• SVETLAN’ (http://www.limsi.fr/Individu/gael/ManuscritThese/)

• TERMINAE (http://www-lipn.univ-paris13.fr/~szulman/TERMINAE.html)

Applications for NL Processing (among others)

• FreeLing (http://www.lsi.upc.es/~nlp/freeling/)

• Gate (http://gate.ac.uk/)

• Landcglobal (http://www.landcglobal.com/)

• Lingsoft (http://www.lingsoft.fi/demos.html)

• PennTools (http://www.cis.upenn.edu/~adwait/penntools.html)

• Traduki (http://traduki.sourceforge.net/)

• Xerox LE (http://www2.parc.com/istl/groups/nltt/xle/)

Guidelines: Published in [OntoWeb2.1]

Related Business Scenarios:

The Natural language Business Scenarios have a strong similarity / Overlap with

• Information Extraction

Natural Language applications can be used as part of the solution for:

• E-Commerce • Semantic Portals (eCRM) • Knowledge Management / Information Systems / Corporate Internets

20

Page 21: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

2.5 Intelligent Information Integration General Description: Intelligent Information Integration covers applications that use ontologies as a means of unifying information of the same type which stems from multiple different sources and, inevitably, conforms to different formats or varying conceptualisations of the domain. In accordance with this model, two distinct classes of application are in fact covered:

1. Use of ontologies as a means for information exchange. As typified by B2B and electronic market place applications (see [Fensel, 2001; Fensel et al., 2001 and 2002]]), in this variation of the scenario the main requirement is to promote interoperability between different partners. That is to say, information should be able to flow between any two participants using a central ontology-based exchange mediator.

2. Construction of a single unified knowledge repository from multiple sources. This variation extends the above by not only allowing an ontology-based mapping between different sources of information, but also to gather and transform the information from the various sources to create a single, unified knowledge repository.

In order to achieve either of these variations, a number of practical problems need to be addressed. These include:

• The need to deal with possibly unstructured (text-based) information sources, such as web sites or product descriptions from a catalogue

• The need to classify different information from unclassified / unorganised information sources

• The need to map from various information sources onto a single central ontology. This problem is not just a matter of syntactic mapping between different formats but may include semantic mapping, such as normalising the price currency within an electronic marketplace.

• The need to provide different views of the integrated information, for example supporting customer-specific personalisation

In this sense, meaningful information can be obtained through mixture of information gathering, information categorization, information extraction, and information organization.

Business Case: The business case for information integration closely relates to the business case for the business scenario “Knowledge Management / Information Systems / Corporate Internets”. Some additional points or points worth reiterating include:

• General Competitiveness through Information Dissemination - For many corporations, efficient and timely access to relevant information is crucial for competitiveness [Empolis].

o Busines Intelligence - In particular, intelligent information integration covers applications that monitor dynamic, external sources of information, such as news feeds. His can allow a corporation to have up-to-the-minute information concerning competitors and the market place and hence the ability to react to change as soon as possible.

• Better Information Cross-Referencing - much valuable information is distributed over existing, till-now unconnected, legacy knowledge and databases (both within or external to a corporation). Collating information from multiple sources and media, leading to more effective cross-referencing e.g. [Profium2]. Information’s sources can be structures (e.g. databases), unstructured (e.g. MS Word documents) or Semi-Structured (e.g. competitor web sites) [Voquette1]

• Efficient Means for Necessary Interoperability - Providing an ontology as an (abstracted) common perspective across multiple information sources is the most efficient way to achieve information integration (e.g. see [Fensel, 2001; Fensel et al., 2002]). Put simply, by providing a central standard via which all communicating parties (e.g. vendors in an electronic marketplace) can communicate radically reduces the number of pairwise mappings between different formats and knowledge models that must be coded.

o Setting Representation Standards – by applying an operational approach, by which a (defacto) standard ontology for communicating parties is established, makes more difficult the independence of the structure of the data in the sources, and hence ensures the ease of management of semantic heterogeneity

21

Page 22: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

• More Efficient Business Processes - The central aspect of many business processes is one of document cross-referencing (e.g. matching CVs to job adverts, requirement specifications to product descriptions, etc.)

Some of the risks associated with establishing an intelligent information integration application include:

• Difficulty of Information Mapping - The mapping problem between different formats is often non-trivial and is in the worst case impossible. Hence, information integration applications are likely to be most successful in domains where a defacto standard ontology is already adhered to by most participants. Unfortunately, the existence of such ontologies is still relatively rare.

o Unstructured Data – Problems of interoperability are compounded when the information sources are primarily text-based

• Proliferation of Multiple Standards and Formats – Inevitably, one accepted standard alone never will not emerge in a given domain, but many standards and formats will persist. For example, even in the banking domain which has supported advanced B2B technology for over 20 years, there is still a high number of different formats for payment messages fuelled inevitably by market forces. Hence companies dedicated to providing format mapping services are required (e.g. [DataJunction, Mercator]).

• Complexities of handling dynamic information – Most commercial information sources have a limited useful lifetime: information about stock or product process may be relevant only for a given trading day or even less, while even information concerning, for example, analyst reports of a given competitor company may remain pertinent only for a few weeks. Issues concerning the relevance of distributed information resources over time have received little regard until now within the ontology community.

Main Requirements: The general requirements for Intelligent Information Integration centre specifically on how information from multiple sources can be efficiently normalised, unified and then centrally accessed. Requirements include:

• Extensibility – It should be relatively straight forward to introduce new information sources into the system, particularly for online systems

o As noted in [Empolis] in the use case of a job agency, this requirement may map on the need for a component-based architecture.

o Many diverse data repositories may need to be integrated (both in terms of content and format)

• The solution must be scaleable – information integration often implies very large ontologies particularly in terms of instance data. This covers the case where information integration is used to establish a single unifying knowledge resource based on multiple information sources and from continuous information feeds.

• Intermediate Expressiveness of ontology language. As noted in [Empolis, Ontopia], the chosen language for a central ontology must allow abstraction away from the concrete data sources and be expressive enough to define cross reference relationships. Conversely, for information integration deep semantic relationships must usually not be modelled.

Status: Of the two types of Information Integration mentioned in the General Description, the first, relating to information exchange, has received the primary interest as it relates heavily to Business-to-Business applications – see [Fensel, 2001; Fensel et al., 2002] for a comprehensive review of the current issues. For many business sectors, defining standards and ontologies are still emerging. For example, well known on-going attempts to define product catalogue standards include UN/SPSC [UNSPSC], Ecl@ss [Ecl@ss], cXML [cXML], RosettaNet [Rosettanet], etc. (see [Fensel, 2001; Fensel et al., 2002] for more detailed lists). It can be expected that this remains an active area for the foreseeable future.

The second type of information integration, i.e. construction of centralised knowledge resources from disparate sources, has until now received most success in the guise of electronic marketplaces primarily in support of E-Commerce. Moves towards sophisticated information integration may be initiated from the

22

Page 23: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

machine learning community under he current hot topic of “web mining”. Some highly interesting research in this direction is given in [Craven et al., 2002].

Concrete Applications: While not strictly speaking a concrete application, under [Empolis] there can be found a number of interesting use cases, specifically relating to the application of Topic Maps. Amongst these is an example of a multi-national bank where it is pointed out that competitive advantage of modern multi-national banks hinges critically on their ability to leverage knowledge in order to make profitable investment. This includes integration of knowledge concerning consumers, industry sectors, economic trends and company profiles as well as experience and best-practice of the investors themselves. Such information typically resides within different computer-based systems (possible internal or external to the bank) or within the minds of expert personnel within the banking corporation. In addition, there is a need to provide different views on this data relevant for, for example, different industry and financial sectors. Hence, it is argued in [emplois1], there is a need for a central ontology (e.g. based on Topic Maps) that provides a common abstraction for this diversity of information.

Another example use case from [Empolis] is that of a Job Agency using topic maps to resolve job-seeker CVs against available employment positions. The central ontology essentially maps skills to job requirements. Many different formats for these two types of information may need to be incorporated – this represents the most complex technological problem. Defining the cross-reference relationships is in comparison not a prohibitive cost. The main business advantage achieved in this way is to automate the linking of potential employees to new job vacancies (see also the description of Whizbang´s support for FlipDog.com – described in business scenario “Information Extraction”. Given that both the population of job-seekers and available jobs is dynamic this can lead to a significant reduction in the effort required by a job agency to maintain its information and allow improved services in terms of a wider-scope of information cross-referencing.

The Ariadne project [Ariadne] Ariadne model aims at building a collection of intelligent agents for extract, query, and integrate data from web sources. Essentially, information extraction technology is deployed to harvest information from Web sites. For example, he system has been applied to integrate online electronic catalogues to provide a more comprehensive virtual catalogue [Knoblock et al., 01]. This application has very interesting uses: (1) two unified companies can integrate their electronic catalogues; (2) an agent can search products according to users requests, and can carried out such searching using catalogues built by different companies.

Whizbang Labs [Whizbang2,3] have also deployed information extraction technology for commercial applications concerning the integration of information from large numbers of web sites, as described in the business scenario “Information Extraction”.

The following related applications are detailed in Deliverable D1.1 [OntoWeb1.1]:

• OBSERVER (http://siul02.si.ehu.es/~jirgbdat/OBSERVER/)

• OntoBroker (http://ontobroker.semanticweb.org/)

• PICSEL (http://www.lri.fr/~picsel/)

Guidelines: Published in Deliverable 2.1 [OntoWeb2.1]

Related Business Scenarios:

The Intelligent Information Integration business scenario has a strong similarity / overlap with:

• Knowledge Management / Information Systems / Corporate Internets

The Intelligent Information Integration business scenario builds on some aspects of the following business scenarios:

• Information Extraction

23

Page 24: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

2.6 Information Extraction General Description: Information Extraction (IE) refers to the identification and localisation of information occurrences within informal or semi-formal text and the extraction of such information to a more rigorous and formal format. IE presents close relations with other business scenarios, such as NLP, being necessary to have a NLP systems to detect and extract relevant information from textual sources.

Information extraction applications may be used to create a summarisation of an original document for rapid manual document selection and/or as the basis for a more efficient automated document retrieval mechanism. So for example, Information Extraction may allow for the automatic attachment of meta data, in the form of RDF tags, to a collection of originally purely ASCII documents. This can also be the mechanism by which such documents linked to an ontology. As such, IE can be seen as a way of moving towards more semantic-driven document management.

Alternatively, the results of information extraction from multiple documents may be aggregated within a central knowledge resources in order to provide centralised information access independent of the original documents (e.g. construction of a corporate knowledge archive). In this way, IE can be used to enable the “Intelligent Information Integration” business scenario.

Finally, IE may be applied to a continuous stream of textual messages, e.g. an email system, in order to provide the basis for automated classification and handling of textual messages. In this way IE may be leveraged in knowledge management applications where high volumes of daily information in textual form would otherwise lead to “information overload”.

The relationship between information extraction and ontologies is two way. On the one hand, using information extraction on top of large, domain specific document collections may be one way of semi-automatically generating and maintaining large-scale ontologies, especially in terms of identifying domain specialisations of top-level ontological concepts and in terms of generating instance data. Approaches for ontology learning based on using IE are detailed in the Deliverable 1.5 [OntoWeb1.5]. Indeed IE my have an important role to play in realising the vision of the Semantic Web as it enables a migration from legacy textual information sources. On the other hand, pre-existing ontologies, particularly terminological ontologies, can be used as the basis for an IE application: synonym sets grounded in a terminological ontology can be the basis for spotting useful concepts within textual documents and the additional semantic constraints within an ontology can be exploited to clean up the results of information extraction, i.e. to boost the precision of an otherwise noisy “keyword spotting” process. In this sense, one of the most commonly used terminological resource is WordNet [Fellbaum 1998]. WordNet includes english nouns, verbs, adjectives and adverbs organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets. Additionally, ontologies can be used to post-process the results of an IE system that is based on syntactic pattern matching [Fensel et al., 2001 and 2002] by I) providing validation checks on extracted facts and II) allowing missing facts to be inferred from extracted information.

On the other hand, pre-existing ontologies, particularly terminological ontologies, can be used as the basis for an IE application: synonym sets grounded in a terminological ontology can be the basis for spotting useful concepts within textual documents and the additional semantic constraints within an ontology can be exploited to clean up the results of information extraction, i.e. to boost the precision of an otherwise noisy “keyword spotting” process. Additionally, ontologies can be used to post-process the results of an IE system that is based on syntactic pattern matching [Fensel et al., 2001 and 2002] by I) providing validation checks on extracted facts and II) allowing missing facts to be inferred from extracted information.

Business Case: Given that Information Extraction is a fairly broad term applicable to a whole range of business scenarios, the business case arguments may vary from application to application. Note also that IE is rarely an application in its own right – the ability to extract information from textual documents is usually done to serve the purposes of some more general application. However, the following are general business arguments for IE applications in general:

• Replacement of manual “content factories” (unification of multiple formats) – One of the main business cases for information extraction relates to B2B applications or any other application where indexing of documents in necessary and is currently carried out manually. The unification of similar information in multiple formats, such as the product catalogues of multiple manufacturers, to provide

24

Page 25: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

a unified central content archive is often achieved by intensive manual labour. The advantages of semi-automating this process via IE include:

o Long term labour cost savings - Semi-automating indexing through information extraction requires a major one off investment (initially) and then diminishing maintenance costs over time. This compares with a high and constant labour cost without automation.

o Scalability – If the indexing or extraction of information from documents is semi-automated, then the manual effort required is less dependent on the volume documents to be processed.

• Enablement of new knowledge management applications – e.g. text->XML conversion. New generation knowledge management systems require more formal representation and access to information than is supported by free-form textual documents and keyword-based full text retrieval. IE can be used as a technology to enable such knowledge management, e.g. by automatically tagging documents with meta-data. The pragmatic advantages are:

o Low cost incorporation of legacy information - For many corporations, much of the corporate knowledge lies in the form of archives textual documents. Therefore, IE, boosted by ontologies, provides one means of semi-automatically upgrading a legacy document archive to a more formal knowledge representation.

o Reduced disruption to existing business processes – supporting new knowledge management approaches may effect business processes in so far as forcing the manual attachment of meta-data and indexing information at the point of authoring of documents. IE can be leveraged to reduce the amount of additional manual effort required.

• More efficient message classification – e.g. email routing. Traditional approaches to document/message classification have relied heavily on purely statistical techniques which, while being fully automated, also have an upper-bound in terms of the accuracy levels that can typically be achieved. By improving the classification accuracy and routing the messages (e.g. emails), more accurately within a corporation IE may have the benefits of:

o Reduce information overload – by reducing the burden of employees who receive many irrelevant emails.

o More rapid response to incoming information - Improving the corporate-wide response time to emails, thus ensuring better global efficiency and more prompt response to customer requests.

o Reduction in manual processing costs - Information extraction can replace first-level manual help desk / email routing centres by automating a tedious and labour intensive activity.

o Enable additional automation of message processing - Information extraction can be a pre- processing step that may enable a range of automated message response facilities.

• More efficient document access (see also sections on “Information Retrieval”) – Following on from the preceding point, by providing more rigorous and formal indexing of documents, users are provided with more accurate and efficient document access. This has measurable benefits such as:

o Efficiency - Reduced average time for document retrieval o Relevant information is not overlooked – In other words, reduced risk in terms of failure to

retrieve relevant documentation

The common pitfalls that can be encountered when deploying Information Extraction technology include:

• Tolerance of imperfect performance – For any realistic deployment of IE technology it is impossible to extract all required information from text with 100% accuracy. Indeed, state-of-the-art IE systems can probably attain about 95% coverage of all information (recall) with about 95% accuracy (precision). Therefore, a noise level of at least 5% must be expected. For some applications (e.g. using IE as a means of improving document retrieval accuracy) this may be acceptable. Where 100% accuracy is required then IE results must be manually post-processed leading to significant additional run-time costs.

• Prohibitive initial investment vs. long term RoI – Setting up an information extraction system takes significant man-power in terms of either manually encoding pattern matching rules or painstakingly training and testing machine learning agents. This investment is then gradually reimbursed over a long period of time as the system is deployed. Hence IE projects are often viewed as high risk because the financial justification for the project is not immediate.

25

Page 26: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

• RoI only indirectly measurable – relating to the above, the transparency of the return of investment of an IE project is often obscured because IE is not a solution in its own right but enables other end applications (such as better document management).

• Technical difficulty in normalising IE results – IE systems typically concentrate on locating and extracting a particular string pattern within a document. As such, surface-level variations in the extracted information may still exist. There may be a need to significant post-process IE results, e.g. replacing members of a synonym set with a symbolic name, or unifying the units of some numeric value such as price, if the information extracted is to be used to feed into a central database. Issues concerning normalisation have not been the focus of IE literature so this may be a potential technology risk.

Main Requirements: In order for ontologies to be used in conjunction with information extraction technology, the following general requirements should hold:

• Relevant information exists in textual format that can be addressed by IE technology in a cost-effective manner:

o Sufficient repetition of particular textual patterns occur that can be detected by IE methods o The number of text format variations is relatively low o The text volumes are sufficiently large to prohibit manual solution

• Imperfections in the IE process can be tolerate in that:

o Ontologies can be used to provide post-IE validation o Manual inspection of IE results can be tolerated o The application tolerates sub-100% accuracy (e.g. IE to support document retrieval)

• Suitable expertise in IE, NLP and Ontology technologies must be available • Large volumes of training data may be required, particularly for applications where machine learning

is to be deployed as the basis for constructing the IE system.

o Note that when machine learning is to be deployed, then some process, typically manual, is required to annotate the training data in order to denote specific occurrences of required information.

Status: The focus for information extraction research and development during the 90s were the so called Message Understanding Conferences (MUC)s. These were a series of standard benchmark tasks for which leading practitioners were invited to develop systems which were then evaluated with respect to performance of these tasks. As such [MUC1] provides the ideal overview of the recent trends and state-of-the-at in IE technology.

In general, IE technologies depend heavily on machine learning and Natural Language Processing (NLP) techniques to capture recurrent textual patterns. In addition, many of the information extraction pattern match rules are often created by hand, or at least the results of machine learning methods are heavily tweaked by hand. As such, most deployment of IE technology to date has carried a heavy cost in terms of consultant support from specialists in building IE systems.

In order for information extraction to be more widely applicable and for the technology to achieve the market potential that the current overload of available text-based information offers, there is a need to produce tools and methodologies that enable non-specialist end-users to build IE systems. More scaleable and reusable solutions are required. Some notable efforts in this direction have commenced, such as the GATE framework [Gate1] (see also section “Natural Language Applications”). Similarly, a number of specialist companies such as CognIT [CognIT], Dun & Bradstreet [D&B], SemanticEdge [Semedge1], Language & Computing [Landcglobal], LanguageComputer [languagecomputer], Linguamatics [linguamatics], and Whizbang Labs [Whizbang1] have been founded in recent years and have produced commercial products to support IE system development. After Autonomy [Autonomy] demonstrated the commercial viability of applying statistical methods to document management problems, a boom of interest in knowledge management occurred and many similar knowledge management companies were formed. It remains to be seen if a second wave of interest in knowledge management based on information extraction technologies will occur.

26

Page 27: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

The recognition that IE and NLP technologies can be deployed to aid in the construction of ontologies is becoming widely recognised but few concrete technologies to support this task exist, including:

• Invention Machine [Invent1]. In particular, Invention Machine use a chain of different NLP and statistical methods to “read” large, domain specific, bodies of text in order to pick out key words/concepts, and to find the semantic relationships between these words/topics. As such, the basis for a domain ontology can be constructed semi-automatically.

An additional review of the status of IE technology can be found in [Ontoknow5]

Concrete Applications:

Whizbang labs’ information technology lies behind FlipDog.com [Whizbang2,3], an online job portal covering some 500,000 job announcements from 50K corporate web sites. To date this is probably the biggest commercial success of IE technology. Information such as job category, location and title are automatically extracted from web sites that are periodically gathered using a crawler. The solution uses little in the way of ontological support, though it is possible to use the same technology to make user-defined classifications of whole documents and as such to build application/user specific taxonomies. The FlipDog.com solution offers some primitive ontological basis (e.g. allowing a user to select geographic region and job category when searching for vacancies) but this could clearly be extended (c.f. the business scenario “Intelligent Information Integration”).

Language & Computing [LandC1] provides a range of tools for performing natural language understanding, information extraction and information retrieval within the medical domain based on a very large scale terminological ontology. More details of an example application can be found under the business scenario “Information Retrieval”. The company offers three complementary tools. LinKFactory is a feature-rich tool used for building complete corporate terminology systems capable of extracting significant value out of the large stores of unstructured data residing in the corporate content databases. This tool aims to perform several functions such as: attaining the conceptual level of language expression, managing a large scale and complex ontology, and defining relationships between concepts. The tool provides a user-friendly way to create, maintain and extend multilingual terminology systems and ontologies (English, Spanish, French, etc.), mainly for the medical domain. LinKBase is a medical knowledge base used by the other applications of this company. Due to its magnitude, formal structure and the fact that it is machine-readable, and it is capable of producing the results needed in automated processes from medical unstructured texts. Finally, LinkNLP integrates several information processing applications that use the LinkBase content, such as. FastCode, the automated medical coding engine; TeSSI, the semantic indexing, retrieval and extraction engine and FreePharma, a tool to analyze and structure free text pharmacological information into structured format. These tools offer several functions for extracting the meaning out of medical free text and placing the resulting medical ‘concepts’ in the document index instead of terms, allowing the seekers to query content stores in natural language and retrieve highly relevant information with great accuracy.

LanguageComputer [languagecomputer] presents an IE system to operate on natural language texts, providing databases and enabling input for a wide range of retrospective analyses on reports, presentations, emails or manuals generated primarily in natural language form. L&C IE system can be integrated with various knowledge management and portal resources to work within different business domains.

Linguamatics I2E system [Linguamatics] is a tool for mining information from unstructured text, such as: technical articles, intranet pages, emails, and project reports. An enormous amount of useful information is stored this way and the key to exploiting it lies in mining some relationships. I2E combines interactive search and language processing with ontologies, enabling users to cut through large volumes of text and quickly home in on key documents, words, entities or relationships. I2E takes documents in text formats (e.g. web pages, emails, journal articles, meeting minutes) and enables users to construct powerful queries over them through a simple graphical interface. The queries can include linguistic constraints, rather than just words.

27

Page 28: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

IBM IE Research Center [IBM] aims to develop accurate and efficient machine learning algorithms for commercially viable IE systems, including bootstrapping learning algorithms and symbolic pattern generalization applicable to relation learning from text. The Computational Linguistics and Text Mining group is pursuing research on trainable IE systems. Although the most accurate systems often involve language processing modules that are hand-built, substantial progress has been made in applying supervised machine learning techniques to a number of processes necessary for extracting information from text. This task naturally decomposes into a sequence of processing steps, typically including: tokenization, sentence segmentation, part-of-speech assignment, named-entity identification, phrasal parsing, sentential parsing, semantic interpretation, etc. The application of machine learning techniques to information extraction is motivated by the fact that hand-built systems are time consuming to develop and require special expertise in linguistics and artificial intelligence/computational linguistics.

Guidelines: Published in [OntoWeb2.1]

Related Business Scenarios:

The Information Extraction business scenario has a strong similarity / overlap with

• Information Retrieval

Information Extraction may be an integral part of several other business scenarios:

• Knowledge Management / Information Systems / Corporate Internets • E-Commerce • Intelligent Information Integration • Semantic Portals (eCRM)

2.7 Information Retrieval General Description: Information Retrieval (IR) primarily concerns the search of a document collection and the return of a salient subset of all documents in response to some original user query. Most techniques centre around the basic principle of retrieving documents based on some function of a collection of keywords [Jones et al., 1997] – i.e. the user types a list of keywords and, as by classical search engines on the internet, documents in which one or more of the given keywords occurs with sufficient frequency will be returned.

There are a number of fundamental problems with IR. Firstly, there is the problem of Precision, namely that many of the documents retrieved typically fail to fulfil the user´s intention due to the imprecision of an indexing method based purely on keywords. Secondly is the problem of Recall, namely that documents that only contain variations or synonyms of the original keywords entered by a user by be overlooked. Related to this is the problem that the user receives little guidance when forming a search query – i.e. they are unaware of what categories of document are available and can only find useful keywords by trial and error.

The use of ontologies to overcome all of these problems is a powerful way of enhancing traditional IR approaches. Ontologies can be used to provide a conceptual framework that explicitly captures logical relationships between collections of keywords relevant to a document collection. By using an ontology to find coherent systems of co-occurring keywords it is possible to boost precision. Alternatively, the ontology can be used as the basis for semantically tagging the documents as a preprocesssing step prior to aid accurate retrieval (see business scenario “Information Extraction”).

More often, a terminological ontology may be used for the purposes of query expansion – i.e. from a keyword given by the user, the ontology is navigated to find a collection of related words that will be collectively used to find matches in the document collection.

Finally, ontologies can also be used to model taxonomies of documents and document content. Such taxonomies can be presented to an end user via a GUI as guidance for selecting those documents of most interest. A simplified version of this approach is typical for most internet search engines, providing a category of topics that can be navigated and used to scope a purely keyword based search query.

28

Page 29: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

Business Case: The business cases for using ontologies to boost the effectiveness of IR applications are as follows:

• Better IR performance through ontologies. As a general advantage, ontologies enable more precise document retrieval as potentially ambiguous words can be mapped on to unambiguous concepts. Similarly, using ontologies as an intermediate basis for information retrieval allows better coverage as the retrieval is no longer sensitive to particular synonyms or surface-level variants, e.g. [LandC1]

• Search Engine Efficiency – Where document collections are very large, search engine efficiency may be a major factor. Semantically indexing documents rather than profiling documents based on keywords reduces the degree of false hits and can therefore produce more performant retrieval engines.

• Reduced labour costs for information access from frequently used document archives – The most measurable cost benefit from improved IR is the time it takes a user of the document retrieval system to find useful information. This time saving can be decomposed into time saved in formulating a search query and, more significantly, time saved by users not having to read irrelevantly retrieved documents. As such, this business case is dominated by improvements in precision.

• More efficient business processes through more widely accessible knowledge – Less easy to measure is the general benefit Information Retrieval can offer through improved information dissemination – increasing the likelihood that the right information is found by the right person at the right time can have many within a corporation and is the basis for many current approaches to knowledge management (see business scenario “Knowledge Management / Information Systems / Corporate Internets”).

• Improved Customer Acceptance for online Portals – Providing tailored search engines on top of online portals as well as having ontologies to guide users searches can both go towards improving the usability of an online portal and hence benefit in terms of wider user acceptance.

• Multilinguality – Matching documents based on (language independent) concepts rather than surface level terms can enable multi-lingual information retrieval applications [LandC1]

• Ease of Implementation – Basic IR systems can be achieved with off-the-shelf technology (e.g. [Autonomy]) and hence, in comparison to other business scenarios, IR systems have relatively low installation and development costs.

The risks commonly associated with deploying information retrieval systems include:

• Non-Transparency of RoI – As for a number of other business scenarios, the cost benefit of improved communication through IR applications is difficult to quantify. This is however offset somewhat by the relatively low cost of IR approaches.

• Open-ended Scope for Retrieval Systems – IR systems will typically be deployed to support multiple end users with different information needs. If the IR system is purely keyword based then this presents no problem as the retrieval mechanism is highly flexible. However, if ontologies/taxonomies are deployed to help boost retrieval accuracy, this necessarily imposes a limited organisational viewpoint on the underlying document collection.

• Potentially Large Data Sets – for IR applications applied to very large, dynamic data collections, including the WWW itself, then any technology introduced to boost the retrieval of documents cannot be computationally intensive as a major bottleneck is just in maintaining the indexing structure for the entire document collection.

o Difficulty in Categorisation – If large document collections are to be segmented and structured with respect to some ontology / taxonomy, then an accurate, automated classification method is required.

• Unpredictable and Uncontrollable Behaviour – Automated methods based on unsupervised learning (e.g. Naïve Bayesian, neural networks, etc.) typically result in a black-box runtime IR system where the link between given search keywords and the actual terms matched in retrieved documents is neither easy to inspect or manually maintain. This may lead to perceived unpredictability in the deployed system and hence some end user resistance.

o Low Retrieval Accuracy – The above acceptance problem is extenuated by the fact that many IR applications, epitomised by web search engines, are still designed to give a document

29

Page 30: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

retrieval accuracy (precision) of 20-50% i.e. it is considered acceptable when the majority of information retrieved is irrelevant!

Main Requirements: Among the main requirements of Information Retrieval applications are:

• Requirements for large data collections – given that most standard IR systems are based on some form of unsupervised learning (i.e. are based on statistical methods), it is often a prerequisite that a large collection of documents already exists in order to be able to initially train the IR system.

• Automated methods for maintaining the IR system are desirable, including:

o Ability to maintain a keyword based indexing of documents based on unsupervised learning o An ability to maintain an ontology / taxonomy, e.g. automatically discovering new emergent

concepts from the documents of a domain o Ability to classify new documents with respect to an existing ontology / taxonomy with high

degrees of accuracy.

• Definition of purpose(s) of document retrieval - For systems that are tailored by using an ontology / taxonomy to guide document retrieval, it is a prerequisite that the use cases for the IR system can be suitably well defined so as to establish well defined criteria for classifying the documents of the document collection.

Status: As mentioned at the start of this business scenario, the IR field is dominated by keyword based approaches to document indexing and retrieval and by unsupervised learning approaches to creating indexing data-structures. A good overview of standard approaches is provided in [Jones et al., 1997].

In many ways, the current generation of knowledge management systems (see business scenario “Knowledge Management / Information Systems / Corporate Internets”) are based on the paradigm of information retrieval – i.e. the purpose of a knowledge management system is to retrieve relevant documents. The fact that such applications can be built using fully automated (unsupervised learning) technologies made IR an attractive business scenario for commercial applications and this, coupled by early successes of companies such as Autonomy [Autonomy], fuelled a thriving industry.

There is however a growing realisation that just retrieving documents, with relatively poor precision rates, is not a sufficient solution for all Information Retrieval problems. Users are interested in answers to specific questions. Users typically conceptualise required documents in terms of a collection of semantic requirements which are poorly reflected in simple keyword lists. Better methods are needed for formally summarising the content of documents. New paradigms are required for visualising across large document collections. There is a growing need to incorporate metadata based indexing into IR systems to boost accuracy. There is a growing awareness that new methods for machine learning that exploit the structure of networked information sources (i.e. the internet) are required. All these issues have given the IR field a new impetus (e.g. see [Sigir01,02]) and suggest a potential for new applications that marry ontology-driven approaches with traditional IR technology.

Concrete Applications:

Language & Computing [LandC1] provide an excellent example of how a large-scale domain-specific terminological ontology can enable qualitatively different information retrieval applications. L&C have specialised in the medical domain and have invested decades of man effort in building a massive ontology with a very high coverage of medical terminology. Based on this ontology a number of technology products have been developed, such as TeSSI [LandC1]. This tool supports the semantic indexing and then semantic retrieval of medical documents such as patient records. Indexing and retrieval is achieved by mapping words and phrases of both documents and queries onto ontological concepts. A significantly more precise and complete document retrieval can thus be performed enabling a doctor to, for example, rapidly check a prognosis against the medical history of a patient. Without the L&C technology, such efficient medical information retrieval could only be achieved by manually indexing documents, which is prohibitively expensive. However it should be considered that this ability to automatically deal with the natural language of a given domain has been attained only after a massive investment in the underlying ontology.

30

Page 31: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

Xerox Corporation [Xerox] has developed askOnce [askOnce], a software to access documents both internal and external and tap into information sources that are structured or unstructured. With a single query askOnce allows simultaneous searches of the Web, e-mail systems, corporate databases containing CRM and ERP information, Intranets, government files (regulatory and patent, for example) and other document repositories, internal and external. askOnce can leap entire platforms, legacy or not, in a single bound. The query language used by askOnce is so powerful it doesn't even have to be supported by the source. If, for example, the source doesn't support the OR operator, askOnce compensates by issuing two queries and automatically comparing and combining the results. It uses a core server application, a set of repository specific "wrappers" allowing it to talk with any data repository, and covers potentially any source, in any format and in almost any language,

EasyAsk [EasyAsk] is a powerful and comprehensive search platform, able to retrieve and mine existing information sites about products and services, assets through a rich, engaging user experience. As a result, EasyAsk relevant answers in the hands of the user. The platform includes the a search server, search workbenches for structured data (i.e commerce catalogs) and unstructured content (i.e., spreadsheets and graphics; other word processing and desktop publishing applications; Adobe PDF files; HTML and XML pages), optional content dashboards, choice of languages for localization support, an integrated report writer, search and navigation analytics, and a set of APIs for custom integration.

The following related applications are detailed in Deliverable D1.1 [OntoWeb1.1]:

OntoSeek (No URL available)

WebKB-2 (http://www.webkb.org/)

Guidelines: Published in [OntoWeb2.1]

Related Business Scenarios:

The Information Retrieval business scenario has a strong similarity / overlap with

• Information Extraction

Information Extraction may be an integral part of several other business scenarios:

Knowledge Management / Information Systems / Corporate Internets

• E-Commerce • Semantic Portals (eCRM)

2.8 Semantic Portals (eCRM) General Description: Semantic Portals cover internet/intranet based information and service provision systems. Typically, search facilities (keyword based and/or via navigable hierarchies) are offered on top of a document collection. Additionally, facilities may be provided for allowing a user to carry out a transaction. As such, providing online sites for purchase of commodities and for supporting electronic marketplaces are typical applications that some under the title of Semantic Portals. Additional facilities, such as access to product reviews, are typically offered by the more successful portals.

Also covered by this business scenario are systems that provide Customer Relations Management over the internet – i.e. on-line helpdesks. For example, a typical service of such an eCRM solution will be to offer a consumer access to an FAQ system to get advice on commonly occurring product problems. The eCRM solution will also typically support email communication, allowing the users to make request for information via emails and, in some cases, receiving automatically generated responses to those emails.

Other applications that come under this business scenario are those portals that actually market valuable information (e.g. up-to-date financial data) via the internet.

31

Page 32: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

Business Case: The business case for semantic portals largely revolves around proving better information services to a population of end users / customers. More concrete arguments include:

• One Point of Information Access - Collating information from multiple sources and media, e.g. [Profium2]

• Personalisation - Providing individualised information access to end users; this can both increase the attractiveness of the portal to end users as well as allowing content providers to target information at particular consumer groups, e.g. [Emplois1, Profium2]

• Direct Sale of Content - Valuable information, such as pertinent and up-to-date financial news, is a saleable commodity, e.g. [Profium2]

• Business Intelligence - Supporting business intelligence gathering (i.e. monitoring competitor companies or one´s own public image), e.g. [CognIT]. Such information is vital for survival in highly competitive sectors.

Some general risks with Portal applications are:

• Reluctance to pay for Internet Services – The communal mindset is that information and services that are made available on the internet should be done so at no cost to the end user. Hence, there is a need to change expectations in order to charge directly for such services over the internet.

• Insufficient Retrieval Mechanisms – Many of today´s semantic portals embody standard information retrieval technologies (see business scenario “Information Retrieval”). Not providing easy-to-use and accurate mechanism for information retrieval often leads to loss of interest by end users.

Main Requirements:

In accordance with the above discussions, the main requirements for Semantic Portals include:

• Powerful information retrieval mechanisms • Flexible and personalised user interfaces

Status: A vast number of Semantic Portals have been established on the internet over the preceding few years, supporting all sectors of commerce and industry. These portals typically provide access to documents via a combination of keyword based retrieval mechanisms and through an organisational structure which reflects some predefined taxonomy.

Concrete Applications: The biggest Finish financial portal was developed by Kauppalehti based on the technology of Profium [Profium2]. Concise and up-to-date financial information, covering recent news and stock market performance, is published in 4 different media and made available to a target group of 400K consumers. Heterogeneous information sources are centralised by (an assumed manual) process of indexing incoming information documents with respect to a defined metadata schema, defined in RDF. Around 200 new articles per day are added to the portal. A majority of the revenue generated from this portal is through directly charging the users of the portal for making this information readily available.

Business Intelligence is another strong application area for Semantic Portals. As stated on [CognIT], companies are interested in answering questions such as “Who deals with whom?”, “How are my companies doing?”, “What orders to my competitors receive?”, “Have any new opportunities arisen?”, “Are there any new competitors?”, “What is our company´s public image?” etc. Such information can be collected and amalgamated from the internet but requires semi-automated methods to keep such information up to date. Companies such as CognIT have developed such technology.

Of the numerous internet portals that could be reviewed here, Amazon.com [Amazon] is perhaps the best known and a good example of best of breed. The points of note include:

• Browsing facilities based on a taxonomy of products • Standard search engine facilities • Personalisation through the ability to make a customer specific view of the online store

32

Page 33: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

• Some intelligent cross selling facilities of the type “customers who, like you, bought X typically also bought a Y”

In the area of research projects it can be found a set of semantic portals. These portals are used as a tool for sharing information and documentation whitin the scientific research comminity or as a dissemination tool for external users of the project. AKT [AKT], KnowledgeWeb [KnowledgeWeb] and OntoWeb [OntoWeb] portals are an examples of these types of semantic portals.

Frameworks of semantic portals are growing day by day. The goals of these frameworks is to create portals in a very short time using ontologies as a backbone for indexing the informations, reduce the effort of the portal administrator to maintain and upkeep the information and to provide a set of functionalities related to the semantic information, such as information integration, queries, etc. FromeWorks as Kaon Portal [KAON], ITM [ITM], ODESeW [ODESeW] and RDF Gateway [RDFGateway] provides different functioanlities for designing semantic web portals.

The following related applications are detailed in Deliverable D1.1 [OntoWeb1.1]:

C-WEB (http://cweb.inria.fr/)

OntoRoadMap (http://babage.dia.fi.upm.es/ontoweb/wp1/OntoRoadMap/index.html)

SEAL (http://ontobroker.semanticweb.org/ontos/aifb.html)

SEW (http://babage.dia.fi.upm.es/sew)

Time2Research (http://141.41.1.131/t2r)

Guidelines: Published in [OntoWeb2.1]

Related Business Scenarios:

The Semantic Portals business scenario has a strong similarity / overlap with

• E-Commerce • Intelligent Information Integration

Semantic Portals business scenarios may incorporate aspects of:

• Information Extraction • Information Retrieval • Natural Language Applications

33

Page 34: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

3 A Framework for on-going Business Scenario Information Gathering

In order to promote the continual collection of information concerning business cases and scenarios, a web page has been set up1 as part of this deliverable to allow rapid entry of salient information by members of the ontology community. The appropriate web site is to be found at the following URL:

http://babage.dia.fi.upm.es/ontoweb/wp1/OntoRoadMap/index.html Via this web site, a continued stream of anecdotal information as well as general information concerning business scenarios will be produced. This information will form the raw material on which future releases of this deliverable will be based, as well as providing input to other related deliverables such as [OntoWeb2.1]. As such, it is expected that this web page will provide a means for refining existing business scenarios over time as well as introducing new business scenarios. Accordingly the design of the forms presented here for information capture is a more fine-grained than the general description of business scenarios given above.

The remainder of this section presents a set of general forms representing the various types of information that are required to full describe business scenarios in detail. These forms are reflected in the organization of the above URL.

3.1 Business Scenario Template Forms This section represents the template for describing actual Business Scenarios. This template should be filled out by various Industrial partners in accordance with the guidelines written in blue italics. Each Business Scenario written in this way will capture the salient properties of a particular class of possible ontology applications.

Note that Business Scenarios can be formed as generalisations of actual ontology applications. As such, it should be valid to enter anecdotal information as part of the general Business Scenario description.

3.1.1 General Information about Business Scenarios This section gives an overview of the Business Scenario

Short Description

Construction and Maintenance Runtime-Deployment Describe briefly and in general terms the process for constructing and maintaining the ontology application – i.e. what steps are involved in building the application and who carries out each step using what types of tools

Describe briefly and in general terms how the end system is used – i.e. types of task it support and the types of user that will use the ontology application

Relevant Business Sectors Medical IT Military /

Government Financial Pharmaceutical Service

Industry Manufacturing Other

Yes/No Yes/No Yes/No Yes/No Yes/No Yes/No Yes/No Name and give short description

3.1.2 Business Case for Business Scenario The following table summarises the main commercial arguments for the Business Scenario

1 Special thanks go to Asun Gómez-Pérez and Angel López-Cima (Laboratorio de Inteligencia Artificial (LIA), Facultad de Informatica (FI), Universidad Politecnica de Madrid (UPM)) for this effort.

34

Page 35: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

Advantages and Return on Investment New Services Enabled

Here it is described what kind of new services can be offered to an end user by deployment of the ontology application. As an example, the construction of an ontology of products and known problems could enable an on-line help-desk service to be provided as part of an eCRM solution.

Potential Cost Cutting Improved Efficiency of Automated Processes Automation of Manual Tasks

Here it is described how an existing automated process can be made more efficient. Qualitative and, where possible, quantitative descriptions of the improvements should be defined. For example providing an ontology of products can boost the accuracy of product retrieval on a shopping portal by 50% with respect to keyword driven search

Here it is described how a currently manual task is replaced, or more typically augmented, by an application enabled by a large-scale ontology. Details of how the automated part of the solution fits together with the manual part should be given, along with qualitative and/or quantitative statements as to the potential extent of automation. Again a good example is a manual help desk whereby an intelligent system based on ontologies might be able to handle 50% of the more trivial user requests automatically with 95% accuracy.

Improved Knowledge Management / Exploitation of Information Improved knowledge management largely covers infrastructure improvements within an organisation. Advantages here are largely qualitative but should be grounded as concretely as possible. For example, construction of a corporate ontology of departments, projects and employee skills might allow for improved efficiency in terms of personnel deployment.

Improved Communication B2B E-Commerce / eCRM Intranet Automated System

Interoperability

Specific details of how ontologies improve business-to-business applications should be listed here

Specific details of how ontologies improve electronic commerce or electronic Customer Relations Management applications should be listed here. Note some overlap to New Services Enabled

Specific details of how ontologies improve communication between members of an organisation (e.g. in project management) should be listed here. Note strong overlap with Improved Knowledge Management

Specific details of how ontologies may help with electronic data exchange in other application areas beyond B2B should be listed here

Miscellaneous Advantages Any additional advantages not covered by the above should be listed here and explained briefly but precisely.

35

Page 36: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

The following table summarises the main potential obstacles to the deployment of the business scenario.

Main Risks

Technology Risks Scalability and Performance Interoperability Other technical risks

Here, the potential performance and scalability issues must be listed – covering both development or deployment. For example, the problems of coordinating distributed ontology development for large-scale ontology construction might be listed, or the fact that run-time knowledge querying and inferencing must operate per user with less than a 1 second delay.

Here, interoperability issues should be listed with respect to how the ontology application is embedded in a larger-scale system. For example, issues addressed here might include how the ontology part of a solution interacts with standard databases.

Here, any other technical problems might be listed, such as conformance to coding standards, etc.

Commercial Risks Source Information

Availability Initial Construction Costs Lack of Transparent RoI Other Commercial Risks

Here issues relating to availability of all required source information needed to construct the ontology should be addressed. For example, the classical knowledge acquisition problem of needing access to domain experts may be salient

Market entry for many ontology-based applications is hampered by the cost of constructing those ontologies. Here, the potential size and therefore prohibitive effects of those costs should be discussed

Particular when applied to knowledge management applications, ontology-based systems risk lack of success because the return of investment is indirect and therefore not easily calculated. Whether or not the cost savings are transparent should be discussed here

Any other scenario-specific commercial risks should be covered her, e.g. legal aspects.

Competition Here a list is given of potential competitor technologies and alternative approaches for this business scenario.

Miscellaneous Risks Any additional advantages not covered by the above should be listed here and explained briefly but precisely.

3.1.3 Main Requirements for Business Scenario The following tables summarise various different types of general requirement of the ontology application that the Business Scenario implies.

36

Page 37: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

Information Sources

Types of Information are Required Here a list and short description should be given of the general types of knowledge that are required. For example, support for on-line sales might require an ontology including; product descriptions, user types and preferences, inter-product dependencies (e.g. cross-selling).

Typical Information Sources Documents (Paper

or Electronic) Internet Existing Ontologies Human Experts Other

Yes/No Yes/No Yes/No Yes/No Name and give short description

Potential for Automated Knowledge Acquisition Here a short description should be given as to how much of the process of knowledge acquisition can be automated – e.g. can data-miming techniques be applied to existing corporate databases to extract ontological concepts and relations.

Technology Requirements

Performance Size of Ontology Speed

How large can the required ontology potentially be. This measure should be broken down into a range of factors such as number of concepts, number of instances, number of relation types, number of relation instances, etc.

Any general speed related aspects, particular concerning runtime performance (e.g. response time of queries to the ontology) should be qualitatively, and where possible quantitatively, stated here

Inferencing Type of Queries Type of Inferencing Complexity and Performance

Here some indication must be given of the type of query that the ontology system must typically handle

Here an indication should be given of the typical types of inferencing that must be performed. For example, is classification (of the style of Description Logic) required. Is reasoning about instance slot values required. Is fuzzy and numeric reasoning required

Here some indication must be given as to how “heavy weight” the inferencing is (e.g. many transitive inferences) and how rapidly the inferencing must be carried out (essential back-office vs runtime inferencing)

Interoperability Here, the main interoperability requeriments should be stated in terms of typical systems with which the ontology applicacion must and/or how this integration is typically achieved.

Miscellaneous Technological Requirements Here, any further general requirements concerning the technology can be listed and briefly described

37

Page 38: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

Personnel Requirements

Knowledge Enginners

Software Engineers Domain Experts End Users Other Stakeholders

Here general requirements with respect to knowledge engineering skills should be indicated

Here general requirements with respect to software engineering skills should be indicated – e.g. is an Internet specialist or database specialist required

Here, the types of domain expert should be listed that are required in the construction of the main ontology – e.g. Level 2 or Level 3 support personnel when automated a help desk

Here the types of end user who will typically interact, directly or indirectly, with the ontology application should be listed.

Any other types of person that are directly involved in the Business Scenario and their role with respect to the ontology application should be briefly described here

Miscellaneous Requirements

Here any additional requirements and comments specific to the Business Scenario should be described briefly. For example, an ontology application used for trouble shooting within the manufacturing industry might have significantly higher requirements with respect to performance and reliability than an application used for online eCRM for example. The latter may have more significant “user friendliness” non-functional requirements however.

3.1.4 Examples of Concrete Applications In this section, one or more concrete applications for each Business Scenario should be summarised. The intention is to provide realistic illustrations of the general descriptions given in the preceding tables. Thus, members of the ontology community are offered the choice of entering general information concerning business scenarios or details of specific applications, or both. The preceding forms should be sued to enter “What (generally)” a business scenario involves and “Why” a business scenario might be pursued commercially.

38

Page 39: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

The table below is used to capture “What (specifically)” a particular application involved and “How” it was realised.

Project / Application Name X

General Information Contact Details Short Description

Here, the names of key companies and personnel involved in the project should be listed along with standard information such as postal address, email address relevant web sites, etc.

Here a short overview of the project should be given – e.g. what was the size and duration of the project, what was its main application task, what development process was followed, etc.

Main Benefits Here, a list of the main benefits should be given and as far as possible quantified. For example, a statement such as “the deployment of this web site on the internal knowledge management web page allowed the company to save X million in the first 6 months” would be nice ☺.

Problems Encountered Success stories are not the only interest here – indeed making potential reasons for failure known to the ontology community is one of the most valuable services this report can offer. Hence, for all projects it should be encouraged that recognised problems are listed here, be they technical, e.g. “the inference engine exploded once the ontology got bigger than 1000 concepts” or political, e.g. “the management stopped the project because the cost savings of improved intranet communication were not measurable”

Information Sources List of Sources Short Description

Here a list of all types of information source should be given and where possible, concrete references, e.g. to the source of an ontology that was reused in the application

For each information source, a short description should be included. This should include for example rational for using the information source, any difficulties encountered, and comments as to the usefulness / reliability of the information

Ontology Technology Language Development Tools Inferencing Engine Other

What ontology language(s) were chosen for this project and what reasons lay behind the choice

What development tools(s) were chosen for this project and what reasons lay behind the choice. What was the general impression of the adequacy of the tools

What inference engine(s and techniques) were chosen for this project and what reasons lay behind the choice. For what purpose was inferencing used.

What other ontology related technologies were deployed.

Methodology Knowledge Acquisition Modelling Guidelines Testing and Quality

Assurance Other

What were the processes deployed for gathering and formalising the knowledge that went into the main ontology

What guidelines were used to select I) the ontology language and II) the way in which the required knowledge was modelled in that language

How was the adequacy of the ontology application ensured e.g. how was it tested that the ontology covered all required concepts accurately?

What additional methodology issues arose during this project. For example, how did the project fit in with more general software engineering principles deployed in the organisation

39

Page 40: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

4 Information Sources and References This section provides a summary of related studies and sources of information.

Onto Knowledge Project:

Information concerning the whole Onto Knowledge project can be found at the following URL:

http://www.ontoknowledge.org/

Onto Knowledge is a 30 month IST funded project started in 1999 and running till 2002. The main project partners are; BT, CognIT, Aidministrator, Vrije Universiteit, AIFB, Swiss Life and Enersearch. The main goal of the project is to establish the role of ontologies within knowledge management. Within this scope, a number of case studies have been performed and documented. Particular material of interest includes;

• [Ontoknow5] – A state-of-the-art report on information extraction. A number of leading IE technology products are described and compared (Inxight, Ask-Jeeves, Semio Corp, Cartia, Verity, Autonomy, COPORUM and Mimir) and some typical applications are specified.

• [Ontoknow19,20,25,27,37,…] – A number of knowledge management application case studies, evaluation reports and market surveys. For example, [Ontoknow27] outlines a case study for the se of ontologies for improved online information retrieval. Similarly, [Ontoknow19,20] details the use of ontologies to organise a corporate intranet, for example allowing efficient search for employees with particular skills. Not all of these deliverables are readily accessible via the Onto Knowledge home pages, either through restricted access or because at the time of writing this report some of these deliverables have yet to be produced. The interested reader is encouraged to contact the Onto Knowledge participants directly.

SemanticWeb Resources:

The official Semantic Web home pages offer many links to relevant resources, to be found under the URL:

http://www.semanticweb.org/resources.html

For the purposes of this report, the most pertinent resources are gained from the list of Semantic Web companies. Some specific technologies and business cases worth mention include:

• [Profium1,2] – Amongst other Profium provides a number of technology solutions for Content Management, for example using meta-data to provide a central organisation of information collated from multiple sources and distributing this information selectively to end users. A typical example application is to provide the infrastructure for an online financial information services.

• [CognIT] – CognIT a.s. provide a range of products including CORPORUM which is a powerful set of tools for knowledge management in general. This technology has been successfully deployed to support a corporate “knowledge factory” – i.e. corporate wide knowledge management. The same technology also forms a basis for creating portal-based services, such as a business intelligence portal.

• [Invent1] – Invention Machine provide a number of advanced linguistic-based tools that allow large bodies of text to be analysed, rapidly and automatically, in order to extract out key concepts and relations (i.e. a shallow form of ontology) which can be useful for information integration and retrieval.

• [Voquette1] – Voquette provides tools for content management based on meta data. As well as facilities for defining meta data on top of existing content management systems, enhanced search facilities are also provided. In addition, technology for voice-enabling information portals enables a step towards the vision of information access “Anywhere, Anytime”.

• There are a number of companies specialising in Topic Map support technology, such as MONDECA [Mondeca], empolis2 [Empolis], Ontopia [Ontopia]... Topic Maps are effectively light weight ontologies that are well suited to the organisation (classification) of large amounts of

2 Bertlesmann Media Group

40

Page 41: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

information (e.g. document collections) as well as supporting ease of navigation of information. As such Topic Maps may be well suited to business scenarios such as “Knowledge Management / Information Systems / Corporate Internets”, “Intelligent Information Integration” and “Semantic Portals (eCRM)”. Topic Maps are also likely to gain in commercial interest as they have been declared an ISO standard (since beginning 2000).

• There are a number of companies that provide support for RDF technology, e.g. Intellidimension [Intelli1]

41

Page 42: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

5 Conclusion The intention of this deliverable is to allow practitioners attempting to apply ontology technology to answer questions such as:

• What types of ontology application currently have the most success? • Where is the biggest potential for new applications? • What is the size of the current ontology market – what is its porential? • Where are the markets for ontology technology moving? • What are currently the main commercial obstacles to ontology technology deployment? • What are the commonalities across all Business Scenarios deploying ontology technology? • What key distinctions between different Business Scenarios exist?

This version of the deliverable is the first of a series to reports that will address these issues in more detail. Here, some initial business scenarios have been introduced and described in varying amounts of detail. In addition the bass for more rigorous collection of information from the community has been established. In the next release of this deliverable, the following goals are aimed to be achieved:

• More complete description of existing business scenarios • Inclusion of new business scenarios where appropriate • Better coordination with related deliverables (e.g. [OntoWeb2.1]) • More in-depth comparative study across all business scenarios

42

Page 43: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

6 References [AKT] http://www.aktors.org

[Amazon] Amazon.com, Amazon.com home pages http://www.amazon.com

[Anand et al., 2001] Anand S. (2001) “Personalising Customer Interactions using Knowledge Mined from Behavioural Logs”, in [E-work and E-commerce 2001]

[Ariadne] The Ariadne Project, Ariadne Project Home Pages http://www.isi.edu/info-agents/ariadne/index.html

[askOnce] askOnce tool home page http://www.askonce.com/

[Autonomy] Autonomy, Autonomy home pages http://www.autonomy.com/autonomy_v3/

[Bat 1995] “KPML: The KOMET-Penman (Multilingual) Development Environment, Release 0.8.” John A. Bateman: Technical Report. Institut f"ur Integrierte Publikations- und Informationssysteme (IPSI). GMD, Darmstadt, 1995.

[Brint] The Premier Knowledge Management Portal and Global Virtual Community of Practice for the New World of Business, Brint Institute knowledge management portal http://www.brint.com/km/

[Chai et al., 2001a] Joyce Chai, Jimmy Lin, Wlodek Zadrozny, Yiming Ye, Margo Stys-Budzikowska, Veronika Horvath, Nanda Kambhatla, Catherine Wolf (2001) “The Role of a Natural Language Conversational Interface in Online Sales: A Case Study”, In International Journal of Speech Technology 4, 2001, Kluwer Academic Publishers

[Chai et al., 2001b] Joyce Chai, Veronika Horvath, Nicolas Nicolov, Margo Stys-Budzikowska, Nanda Kambhatla, Wlodek Zadrozny (2001) “Natural Language Sales Assistant – A Web-Based Dialog System for Online Sales (Deployed Application)”, In Proceedings of the 13th Annual Conference on Innovative Applications of Artificial Intelligence, Seattle, August 2001

[CognIT] Corporate-Wide Knowledge Sharing; Intelligent Search Agents; Intelligent Text Extraction Tools, CognIT home pages http://www.cognit.no

[Craven et al., 2002] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery (2002) “Learning to Construct Knowledge Bases from the World Wide Web”, Artificial Intelligence journal. Downloadable from http://www-2.cs.cmu.edu/~webkb/

[cXML] Commerce XML Resources, cXML home pages http://www.cxml.org/

[Dabiri et al., 2002] Dabiri G., Nitzsche M., Aretoulaki M., Brown M. (2002) “A User Sensitive Spoken Dialogue System Incorporating Emotional Responsiveness”, Proceedings of 2nd Hellenic Conference on Artificial Intelligence (SETN-2002), Thessaloniki, April

[DataJunction] Data Junction - Integrating the Interconnected World, Data Junction home pages http://www.datajunction.com/

[D&B] Dun & Bradstreet home page, http://www.dnb.com/

[EasyAsk] EasyAsk home page http://www.easyask.com

[Ecl@ss] Ecl@ss Standardized Material and Service Classification, Ecl@ss home pages http://www.eclass.de

[Empolis] Welcome to Empolis, Empolis home pages http://www.empolis.co.uk/home/home.asp

[EWN001] “User Requirements and Functional Specification of the EuroWordNet Project.” Bloksma L, Diez-Orzas P, Vossen P (1996) Deliverable D001, WP1, EuroWordNet, LE2-4003. http://www.illc.uva.nl/EuroWordNet/

[eWork 2001] “Status Report on New Ways to Work in the Knowledge Economy”, IST document, 2001 http://www.eto.org.uk

43

Page 44: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

[E-work and E-commerce 2001] “Novel solutions and practices for a global networked economy”, e.2001, 17-20 October 2001, Venice, Italy. Edited by Brian Stanford-Smith and Enrica Chiozza, ISBN: 1 58603 205 4. URL: http://www.ebew.net/programme.htm

[Fellbaum 1998] “WordNet: an electronic lexical database” Ed: Christiane Fellbaum. MIT Press, June 1998

[Fensel, 2001] Fensel D. (2001) Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce, Springer Verlag, 2001

[Fensel et al., 2001] Fensel D., Ding Y., Omelayenko B., Schulten E., Botquin G., Brown M., Flett A. (2001) Product Data Integration in B2B E-Commerce, in IEEE Intelligent Systems, Jul/Aug. 2001

[Fensel et al., 2002] Fensel D., Ding Y., Omelayenko B., Schulten E., Botquin G., Brown M., Flett A., Dabiri G. (2002) Intelligent Information Integration in B2B E-Commerce, Kluwer, 2002

[Frauenfelder, 2001] Frauenfelder M. (2001) A Smarter Web, MIT Enterprise Technology Review, Nov 2001, http://www.techreview.com/magazine/nov01/frauenfelder.asp

[GATE] GATE – General Architecture for Text Engineering, GATE home pages http://gate.ac.uk

[IBM] IBM Information Extraction Research Center home page http://www.research.ibm.com/IE/

[IKM] The Knowledge Management Resource Centre, home pages hosted by IKM Incorporation http://www.kmresource.com/

[Intellidimension] Intellidimension – Adding intelligence to information…, Intellidimension home pages http://www.intellidimension.com

[Invention] Invention machine – Knowledge Based Innovation, Invention Machine home pages http://www.invention-machine.com

[ITM] Intelligent Topic Manager. Mondeca. http://www.mondeca.com

[Jones et al., 1997] “Readings in Information Retrieval” Karen Spark Jones, Peter Willett, Morgen Kaufmann Publishers, 1997

[Kaon] Kaon Portal. A simple tool for generating multi-lingual, ontology-based Web portals. Karlsruhe University. http://km.aifb.uni-karlsruhe.de/kaon2/Members/rvo/kaon_portal/view

[KM-Forum] The Knowledge Management Forum, home pages of the Knowledge Mangement Forum http://www.km-forum.org/

[Knoblock et al., 01] Knoblock A.C., Minton S., Ambite J.L., Ashish N., Muslea I., Philpot A.G., Tejada S. (2001) The Ariadne approach to web-based information integration, International Journal on Cooperative Information Systems (IJCIS) Special Issue on Intelligent Information Agents: Theory and Applications, 10 (1/2)

[KnowledgeWeb] http://knowledgeweb.semanticweb.org

[Landcglobal] Language & Computing home page http://www.landcglobal.com/

[languagecomputer] LanguageComputer home page http://www.languagecomputer.com

[L&C] Language & Computing – Making computers understand natural language, L&C home pages http://www.landc.be

[linguamatics] Linguamatics home page http://www.linguamatics.com/

[L&C] Language & Computing – Making computers understand natural language, L&C home pages http://www.landc.be

[Lloyd, 2001] Lloyd R. (2001) Joining the Internet Ecosystem, in “Novel solutions and practices for a global networked economy”, e.2001, 17-20 October 2001, Venice, Italy

[Malhorta, 2001] Malhorta Y. (2001) BRINT Institute's Book on Knowledge Management, BRINT Institute, 2001 available online - www.kmbook.com

44

Page 45: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

[Mena, 1998] Mena E. (1998) OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies, PhD. thesis, Universidad de Zaragoza, November 1998

[Mercator] Mercator Intelligent Business Integration, Mercator home pages http://www.mercator.com/

[Mondeca] A Better Content Organisation … for an Accurate Information Retrieval, MONDECA home pages http://www.mondeca.com/fr

[MUC] Introduction to Information Extraction, Message Understanding Conferences (MUC) home pages hosted by NIST - http://www.itl.nist.gov/iaui/894.02/related_projects/muc/index.html

[ODESeW] ODESeW. Automatic Generation of Knowledge Portals for Intranets and Extranets. Universidad Politécnica de Madrid. http://babage.dia.fi.upm.es/sew

[Ontopia] Welcome to the Land of Ontopia, Ontopi home pages http://www.ontopia.net

[OntoKnow5] “Information Extraction State-of-the-Art” Robert Engels, Bernt Bremdal(CognIT) – Onto Knowledge Deliverable 5 - http://www.ontoknowledge.org/del.shtml

[OntoKnow19] “Swiss Life Application Study” – Onto Knowledge Deliverable 19 - http://www.ontoknowledge.org/del.shtml. Password Restricted!

[OntoKnow20] “Organisational Memory: Description of Case Study Prototypes” – Onto Knowledge Deliverable 20 - http://www.ontoknowledge.org/del.shtml. Password Restricted!

[OntoKnow25] “Help Desk Prototype” – Onto Knowledge Deliverable 25 - http://www.ontoknowledge.org/del.shtml. Not Available for Download!

[OntoKnow27] “On-To-Knowledge EnerSearch Virtual Organization Case Study: Requirements Analysis Document” – Onto Knowledge Deliverable 27 - http://www.ontoknowledge.org/del.shtml. Password Restricted!

[OntoKnow37] “Market and Competitor Report” – Onto Knowledge Deliverable 37 - http://www.ontoknowledge.org/del.shtml. Not Available for Download!

[OntoWeb] http://www.ontoweb.org

[OntoWeb1.1] “Technical Roadmap v1.0” Oscar Corcho (ed), Mariano Fernández-López (ed), Asunción Gómez Pérez (ed) Natahalie Aussenac-Gilles, Socorro Bernardos, Olivier Corby, Peter Crowther, Rob Engels, Miguel Esteban, Fabien Gandon, Yannis Kalfoglou, Manuel Lama, Angel López, Adolfo Lozano, Enrico Motta, Natasha Noy, York Sure, OntoWeb Deliverable D1.1, Nov. 2001

[OntoWeb1.5] “A survey of ontology learning methods and techniques” Asunción Gómez-Pérez (ed), David Manzano-Macho(ed) OntoWeb Deliverable D1.5, http://ontoweb.aifb.uni-karlsruhe.de/Members/ruben/Deliverable1.5

[OntoWeb2.1] “Successful scenarios for ontology-based applications” Alain Léger (ed) Hans Akkermans, Michael Brown, Rose Dieng, Ying Ding, Asunción Gómez-Pérez, Siegfried Handschuh, Anthony Hegarty, Andreas Persidis, Rudi Studer, York Sure, Brigitte Trousse, Valentina Tamma, OntoWeb Deliverable D2.1, Jan. 2002

[OntoWeb2.4] “Ontology-based information exchange for knowledge management and electronic commerce” Alain Léger, Andreas Presidis, Philippe Kervella, Didier Riou, Rose Dieng (eds) OntoWeb Deliverable D2.4, Jan. 2004

[Profium1] “Profium – Leader in XML and RDF Metadata Based Content Management” – Profium home pages http://www.profium.com/index.shtml.en

[Profium2] “Profium Success Story – Kauppalehti Online – Financial News Portal” – download from home pages

[RDFGateway] RDF Gateway. A platform for the semantic web. Intellidimension http://www.intellidimension.com

[RosettaNet] RosettaNet Home, RosettaNet home pages www.rosettanet.org

45

Page 46: D1.2: Bussines Scenarios v2 - STI Innsbruck · V1.1 11/06/2004 Reedition by J.A.Ramos (UPM) Draft proposal to v2.0 V1.2 18/06/2004 Update of KM/Information Systems section by Ontoprise.

OntoWeb. D1.2.2. Bussiness Scenarios IST-2000-29243

[SemEdge] SemanticEdge, SemanticEdge home pages www.semanticedge.com

[SIGIR2001] ACM Special Interest Group on Information Retrieval – SIGIR2001, Homepages of SIGIR 2001 conference http://www2.sis.pitt.edu/~sigir01/

[SIGIR2002] ACM Special Interest Group on Information Retrieval – SIGIR2002, Homepages of SIGIR 2002 conference http://www.sigir2002.org/

[UNSPSC1] “Universal Standard Products and Services Classification” – UNSPSC home pages http://eccma.org/unspsc/

[Vata01] Vatant B. (2001) Topic Maps tools for Knowledge Management, MONDECA, presented in frame of APQC project “Managing Content and Knowledge” downloadable via [Mondeca]

[Verbmobile] Verbmobil, Vermobil project home pages http://verbmobil.dfki.de/

[Voquette] Voquette, Voquette home pages http://www.voquette.com

[Whizbang1] "Whizbang Labs – a focus on information extraction” – Whizbang Labs home pages http://www.whizbang.com

[Whizbang2] "Whizbang Labs – Success Stories – Flipdog.com” – Whizbang Labs home pages http://www.whizbang.com/solutions/ssflipdog.html

[Whizbang3] "Whizbang Labs – White Papers– Information Extraction & Text Classification” – Whizbang Labs home pages http://www.whizbang.com/solutions/wbwhite.html

46


Recommended