Date post: | 29-Jan-2016 |
Category: |
Documents |
Upload: | peter-lester |
View: | 218 times |
Download: | 0 times |
Worldwide LexiconBrian McConnell
May, 2002
WWL – Brian McConnell
Worldwide Lexicon Intro
• Automatic discovery of dictionary, semantic net and translation servers throughout the net
• Creates standard client/server interface for communicating with servers
• Creates distributed human computing grid (allows servers to poll idle users to enter data, score recent submissions)
• “GNUtella for language services”
WWL – Brian McConnell
What WWL Does
• Creates a SOAP based interface for locating and communicating with language services
• Creates mechanism for discovering WWL servers on the fly
• Allows any application to talk to language servers with a few lines of code
• Allows existing dictionaries and MT systems to expose their data via WWL
• Creates something similar to SETI@Home, except it taps idle users to contribute knowledge
• Creates a web services API for language services
WWL – Brian McConnell
What WWL Does Not Do
• Does not create a global, centrally managed dictionary (WWL is a P2P network of dictionaries and language servers)
• WWL does not provide machine translation services (although WWL can be used to talk to existing MT servers)
• WWL does not compete with existing dictionaries or translation services. It makes existing systems more accessible to applications and their users.
• WWL does not specify details about how dictionary and MT server internal processes
WWL – Brian McConnell
Some Example Applications
• Browser and text editor plug ins
• Extended dictionaries for machine translation systems
• Human assisted document translation
• Lexicon@Home client (polls users to enter data when they’re not busy)
• Multilingual chat clients (poll WWL data sources as needed to assist with translations)
• Real-time translation (via Jabber or SMS)
• Teaching aids
• User supported dictionaries and translation memories
WWL – Brian McConnell
Worldwide Lexicon Protocol
• Built upon the Simple Object Access Protocol
• Applications communicate via a small set of SOAP methods
• HTTP CGI interface also used for data entry and user peer review
• Goal: allow developers to locate and query any WWL data source with a few lines of code.
WWL – Brian McConnell
Protocol Overview
• Three types of methods
• WWL server discovery and network status methods
• WWL client/server query methods
• Utility functions
• About a dozen methods overall
WWL – Brian McConnell
System Overview
• Four basic types of nodes
• Supernodes (directory servers)
• WWL servers (dictionaries, MT servers, semantic nets)
• Gateways (allow non-WWL servers to present WWL front end)
• Client apps (plug ins, IM clients, Lexicon@Home, etc)
WWL – Brian McConnell
WWL Server Discovery
• Client app contacts a WWL supernode
• Invokes WWLFindServers() to fetch list of active servers and gateways that can process client’s request
• Supernode replies with a list of WWL servers, as well as information about each server’s capabilities
• WWL servers and gateways announce selves to supernodes at startup via WWLRegister() and WWLServerStatus() methods
WWL – Brian McConnell
WWL Supernodes
• Track current status of WWL servers and their peers (servers send registration and status messages)
• Client apps use supernodes to locate WWL servers and gateways on the fly (e.g. locate Spanish-French full-text translation server)
• Supernodes also provide quality control (known WWL servers are listed first)
• Anyone can host a supernode (similar to GNUtella directory servers)
WWL – Brian McConnell
WWL Gateways
• Translate WWL/SOAP method calls into other formats
• Can be used to talk to DICT dictionary servers
• Can be used to talk to proprietary systems
• Can do screen scraping (e.g. send query to web based MT server via CGI, scrape results from HTML response)
• Can even be used to cache and index static wordlists, and to make them appear to users as WWL data sources to any WWL client
WWL – Brian McConnell
Client/Server Communication
• Three SOAP methods allow clients to submit queries to WWL servers via standard interface.
• WWL servers reply via SOAP, results are returned to client app in XML data structure
• WWL interface can co-exist with other interfaces (DICT, web/cgi, WAP, etc)
WWL – Brian McConnell
Typical Client Session
• Contacts WWL supernode(s) to fetch list of active WWL servers according to language, services required
• Contacts top ranked WWL server to perform query (e.g. translate phrase from spanish to french)
• If query fails, contacts other WWL servers to perform query
WWL – Brian McConnell
Application Development
• WWL defines a client/server interface
• Client and server apps can be developed and tested independently
• System is complex, but individual components are simple
• Perfect fit for open source development model
WWL – Brian McConnell
Server Apps & Projects
• Updating existing dictionaries and machine translation servers for WWL and Lexicon@Home
• Building gateway servers that emulate WWL while talking to non-WWL servers (DICT, HTTP, etc)
• Document translation servers based on Lexicon@Home concept
WWL – Brian McConnell
Client Applications
• Browser/text editor plug ins
• WWL chat clients
• Lexicon@Home clients
• Teaching aids
WWL – Brian McConnell
Updating Existing Servers
• As simple as adding a few scripts to respond to SOAP calls (reply via SOAP versus HTML)
• SOAP/WWL interface co-exists with other front ends
• WWL server can be read-only, or can allow user data entry through Lexicon@Home initiative
• Allows hundreds of existing dictionaries, encyclopedia and machine translation servers to participate in WWL with minimal effort
WWL – Brian McConnell
Example: WWL Chat Client
• Listens to incoming and outgoing messages
• When user enables translation, IM client uses WWL to contact machine translation servers as needed
• When user enables dictionary features, IM client assists user in translating words and phrases when composing messages (ideal for users who know a language but are not fluent)
WWL – Brian McConnell
Lexicon@Home
• Distributed human computing
• Users download small client program that polls WWL server(s) for jobs when user is not busy
• When WWL server has job, it instructs Lexicon@Home client to force browser to form/CGI user (data entry form is generated by WWL server)
• User enters requested information (definition, translation, score for other user’s submission)
• Each user does small amount of work, with large population system learns at rapid pace
WWL – Brian McConnell
Quality Control
• Editorial oversight (WWL servers can require some or all user submissions to be reviewed by editors and trusted users via private CGI form)
• Randomized peer review (WWL server asks some lexicon@home users to score submissions from the peers.
• Hybrid system that combines randomized peer review with editorial oversight (editors focus on submissions with ambiguous scores or from unknown users).
WWL – Brian McConnell
Project Timeline
• WWL protocol spec is available at www.worldwidelexicon.org
• Work to develop first generation apps (supernodes, retrofit existing dictionary servers) is underway
• Work to develop Lexicon@Home client is in progress
• Looking for developers to contribute to project
WWL – Brian McConnell
Development Priorities
• Stable supernode server
• Source libraries for use by existing dictionary and translation servers
• WWL gateway servers (to talk to non-WWL sites)
• Lexicon@Home client
• Simple client apps (browser plug in, IM client that links to MT servers)
WWL – Brian McConnell