+ All Categories
Home > Documents > NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more...

NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more...

Date post: 24-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
37
NLP for the Masses: Integrating GATE with Desktop Clients Ren´ e Witte Semantic Software Lab Concordia University Montr´ eal, Canada FIG3, Montr´ eal, 2010 Ren´ e Witte NLP for the Masses
Transcript
Page 1: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

NLP for the Masses:Integrating GATE with Desktop Clients

Rene Witte

Semantic Software LabConcordia University

Montreal, Canada

FIG3, Montreal, 2010

Rene Witte NLP for the Masses

Page 2: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

The Problem. . .

Rene Witte NLP for the Masses

Page 3: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Outline

1 Introduction

2 Semantic Assistants: NLP Web Services

3 Desktop Plug-Ins for NLP

4 Conclusions

Rene Witte NLP for the Masses

Page 4: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Motivation

Introduction

Information Overload

Web 2.0 applications lead to more user-generated content(e.g., product reviews)

News Business: professional and layperson as content creators(e.g., Twitter, blogs, social networks)

Digitization of printed media

Information Processing

Finding information is fast, analysing consumes a lot of time

Applies to E-mail, Web documents, Intranets, . . .

GATE NLP pipelines can help (e.g., to summarize productreviews) – but how do we get them to the users?

Rene Witte NLP for the Masses

Page 5: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Motivation

Where are we today?

Status Quo in NLP

A solid decade of NLP framework developmentVariety of robust NLP plugins and pipelines

Summarization, Question-Answering, Information Extraction,Opinion Mining, . . .

Status Quo in Dekstops

But what is available to end users on their desktop today?No NLP in word processors, email clients, Web browsers,IDEs, . . .

Why?

1 users don’t want/need NLP?2 lack of software engineering work covering NLP?

Rene Witte NLP for the Masses

Page 6: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Motivation

Supporting the “Knowledge Worker”

Typical Workflow, as of today

Receives task/request via email, text message, etc.

Searches for information via Google, Desktop Search, etc.

Note: typically involves lots of natural language documentsInformation Retrieval (IR) alone not sufficient

Read, understand, evaluate results. Solve task. Repeat.

With Semantic Assistants

Users stay within (desktop) tool environment needed for their task

tool (e.g., word processor) recognizes user’s need forinformation

tool initiates data search in the background

analysis tools (text/data mining) evaluate search results

assistant offers results to user within his interface

Rene Witte NLP for the Masses

Page 7: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Motivation

Semantic Assistants

Workflow Overview

NLP Service 1

NLP Service 2

NLP Service n

NLP ServiceResult

FocusedSummarization

...

Client

− Parameter− Calling an NLP Service

Word Processor

Server

Features

Focus on the user’s needs, not the NLP tool

Avoid context-switches through application integration

Rene Witte NLP for the Masses

Page 8: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

RequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

1 Introduction

2 Semantic Assistants: NLP Web ServicesRequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

3 Desktop Plug-Ins for NLP

4 Conclusions

Rene Witte NLP for the Masses

Page 9: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

RequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

Semantic Assistants

Support three distinct user groups

End Users: no knowledge in software engineering, NLP, orlanguage engineering. Needs easy access to semantic(NLP) services within his desktop tools.

Language Engineer: develops NLP resources, tools, and pipelines.Presume no particular knowledge of serverdeployment or client (GUI) integration.

System Integrator: provides the integration of NLP services intodesktop clients. Don’t assume knowledge of NLPfoundations or (GATE) framework details.

Semantic Assistants Approach

Provide a separation of concerns for these user groups.

Rene Witte NLP for the Masses

Page 10: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

RequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

Integration Architecture

Web Service-Based Architecture

Standard (W3C) Web Services, using WSDL & SOAP

Implemented using Java Web Services (Java 6)

Client-Side Abstraction Layer (CSAL) for easy integration

New

Application

Tier 4: ResourcesTier 2: Presentation and Interaction Tier 3: Analysis and RetrievalTier 1: Clients

OpenOffice.org

Writer

Web

Client

Plug−in

Plug−in

Clie

nt−

Sid

e A

bstra

ctio

n L

aye

r

We

b S

erv

er

NLP Service Connector

Wiki System

NLP SubsystemService Information

Service Invocation

Presentation

Navigation

Annotation

NLP Services

Web/IS Connector

Question Answering

Autom. Index Generation

Information Retrieval

Information Extraction

Automatic SummarizationExternal

Documents

Indexed

Documents

Descriptions

NLP Service

Rene Witte NLP for the Masses

Page 11: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

RequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

Modeling the User’s Context

Two-tiered Ontology Model

Modeling task, tools, available resources in an upper ontology

reusable across domains (e.g., software process model)

Concrete ontology models domain-specific information

user’s languages, available (text) analysis pipelines, etc.

The five main concepts

Format

Artifact

hasFormat

Language

hasLanguage

User

knowsLanguage

Task

hasTask

Rene Witte NLP for the Masses

Page 12: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

RequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

Abstract Ontology

Contains the central Artifact concept

Parent concept for programs, documents, parameters, etc.

Important relations, e.g., consumesInput, producesOutput,hasParameter, . . .

Rene Witte NLP for the Masses

Page 13: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

RequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

Concrete OntologyExtending the upper ontology

Domain-specific concepts, e.g., NLPTool

Some implementation-specific concepts

Rene Witte NLP for the Masses

Page 14: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

RequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

Steps for Integrating GATE into a (Desktop) Application

NLP Work

Develop the GATE Pipeline as usual (.gapp file)

Publish GATE Pipeline as Web Service

Write meta-information for the pipeline (input, output,language, etc.) in a corresponding OWL file

Server will publish the new pipeline to (all) clients

Develop Client Plug-In

Very easy for Java-based plug-ins (using provided CSALabstraction layer)

Most other languages have support for using the publishedWeb Service (WSDL) description

Rene Witte NLP for the Masses

Page 15: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

RequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

Code Examples

Client connecting to the server and finding available services

// Create a factory object

SemanticServiceBrokerService service

= new SemanticServiceBrokerService();

// Get a proxy object, which locally represents the service endpoint

SemanticServiceBroker broker

= service.getSemanticServiceBrokerPort();

// Proxy object is ready to use. Get a list of available NLP services.

ServiceInfoForClientArray sia = broker.getAvailableServices();

Rene Witte NLP for the Masses

Page 16: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

RequirementsSystem ArchitectureOntology-Based Context ModelApplication Integration

Making use of the OWL ontology model

Finding appropriate NLP services

Client plug-in can query server for NLP services that are useful forthe current user, based on the context:

Language capabilities of user/pipeline

Current user client

. . .

Queries are implemented using SPARQL

Ask for services that produce English or German as output:

SELECT ?x ?name

WHERE { ?x sa:hasGATEName ?name .

{?x cu:hasFormat sa:GATECorpusPipeline_Format} . {

{?x sa:hasOutputNaturalLanguage cu:en} UNION

{?x sa:hasOutputNaturalLanguage cu:de}}

}

Rene Witte NLP for the Masses

Page 17: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

1 Introduction

2 Semantic Assistants: NLP Web Services

3 Desktop Plug-Ins for NLPWord ProcessingSoftware Engineering

4 Conclusions

Rene Witte NLP for the Masses

Page 18: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

An everyday task

Given

Access to the Web

Question

What is the role of DMSP [Dimethylsulfoniopropionate] in theAtlantic marine biology within the global climate change?

Your Task

Write a 450-word essay answering the question!

Rene Witte NLP for the Masses

Page 19: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

Semantic Assistants Client: OpenOffice.org Writer

Client-side plugin for word processor integration

Users get a new menu item: “Semantic Assistants”

Available services are discovered based on context throughontology queries (SPARQL)

Services executed asynchronously in the background

Results displayed based on type (new text window or browserwindow)

Rene Witte NLP for the Masses

Page 20: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

Invoking an Assistant

Example Scenario

User works on a report about “DMSP in the Atlantic marinebiology” (DMSP = Dimethylsulfoniopropionate)

The “Web Retrieval Summarizer” will initiate a web search,retrieve the results, automatically summarize the content ofthe top-n hits

result can be further edited or refined by the user

Rene Witte NLP for the Masses

Page 21: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

Assistant Invocation – Communication Flow

Client Abstraction Layer Server Language Service Resources

Invoke service

Invoke service

Metadata lookup

Metadata

Run service

Store result

Result stored

Return

Collect result

Result

Transform result

Response Message

Refined Response

msc Invoking a Single Language Service

Rene Witte NLP for the Masses

Page 22: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

Rene Witte NLP for the Masses

Page 23: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

1 Introduction

2 Semantic Assistants: NLP Web Services

3 Desktop Plug-Ins for NLPWord ProcessingSoftware Engineering

4 Conclusions

Rene Witte NLP for the Masses

Page 24: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

NLP Support for Software Engineers

Software Artefacts Written in Natural Language

Requirements Documentation

Issue Tracker Messages

SVN Commit Messages

Source Code Comments

Quality Assessment

No efficient means of analysing the quality of content

Guidelines usually enforced manually

Automatic quality assessment tools needed

Rene Witte NLP for the Masses

Page 25: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

Automatic Quality Assessment of Source Code Comments

Analysis Heuristics

Internal (NL Quality) Comment Analysis: A set of heuristicstargeting the natural language quality of the in-linedocumentation itselfCode/Comment Consistency Analysis: The followingheuristics analyse in-line documentation in relation to thesource code being documented

Semantic Assistants for Software Engineers

NLP analysis of source code comments implemented as aGATE pipeline, ‘JavadocMiner’Published as Web service through the Semantic AssistantsframeworkIntegrated into Eclipse through an SA plugin

Rene Witte NLP for the Masses

Page 26: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

JavadocMiner Implementation

Rene Witte NLP for the Masses

Page 27: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

Eclipse Plug-In

Features

Execute GATE pipeline with file loaded into Eclipse (e.g.,Java Source Code)

Display results, create markes when line numbers are available

Rene Witte NLP for the Masses

Page 28: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

ANNIE results in Eclipse (Table View)

Rene Witte NLP for the Masses

Page 29: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Word ProcessingSoftware Engineering

Rene Witte NLP for the Masses

Page 30: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Product ReviewsHeritage Data AnalysisMoving on. . .

1 Introduction

2 Semantic Assistants: NLP Web Services

3 Desktop Plug-Ins for NLP

4 ConclusionsProduct ReviewsHeritage Data AnalysisMoving on. . .

Rene Witte NLP for the Masses

Page 31: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Product ReviewsHeritage Data AnalysisMoving on. . .

Analysing Product Reviews

Rene Witte NLP for the Masses

Page 32: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Product ReviewsHeritage Data AnalysisMoving on. . .

Heritage Data Analysis

Large unstructured corpora

Outdated terms, style of writing, huge amount, nocategorization or assessment

Comparing and evaluating with current content

Rene Witte NLP for the Masses

Page 33: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Product ReviewsHeritage Data AnalysisMoving on. . .

Wiki/NLP Integration

Rene Witte NLP for the Masses

Page 34: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Product ReviewsHeritage Data AnalysisMoving on. . .

Back-of-the-Book Index Generation

....für eine äußere Abfasung der Kanten ...

für/APPR eine/ART äußere/ADJA Abfasung/NN der/ART Kanten/NN

NP:[DET:eine MOD:äußereHEAD:Abfasung]

NP:[DET:der HEAD:Kanten]

Abfasung [Lemma: Abfasung]

Abfasung: Page 182 −äußere: Page 182Kante: Page 182XML output

Index Generation

Lemmatizer

NP Chunker

POS Tagger

XML input

Kanten [Lemma: Kante]

Rene Witte NLP for the Masses

Page 35: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Product ReviewsHeritage Data AnalysisMoving on. . .

Conclusions and Future Work

Semantic Assistants Architecture

Semantic “glue” allows to model users, their context, and NLPservices, and rapidly integrate them to provide support to users

OWL ontologies and SPARQL queries

Integration using W3C Web services

Abstraction layer for easy client integration

Further Improvements

Enhancing the context model

Domain-specific ontology refinements

Improvement of many technical details

Development of further client plug-ins

Rene Witte NLP for the Masses

Page 36: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Product ReviewsHeritage Data AnalysisMoving on. . .

More information

Semantic Assistants Distribution

Papers and additional documentation is available on

http://www.semanticsoftware.info

http://rene-witte.net

Source code on SourceForge:

http://sourceforge.net/projects/semantic-assist/

Rene Witte NLP for the Masses

Page 37: NLP for the Masses: Integrating GATE with Desktop Clients · Web 2.0 applications lead to more user-generated content (e.g., product reviews) News Business: professional and layperson

IntroductionSemantic Assistants: NLP Web Services

Desktop Plug-Ins for NLPConclusions

Product ReviewsHeritage Data AnalysisMoving on. . .

Acknowledgements

Semantic Assistants

Bahar Sateli, Nikolaos Papadakis, Tom Gitzinger

JavadocMiner

Ninus Khamis

Juergen Rilling

Heritage Data Analysis

Thomas Kappler

Ralf Krestel

et al.

Summarization, Product Review Analysis

Sabine Bergler, Ralf Krestel, et al.

Rene Witte NLP for the Masses


Recommended