Post on 18-Jan-2018
description
transcript
Cross Language Information Cross Language Information Exploitation of ArabicExploitation of Arabic
Dr. Elizabeth D. Liddy
Center for Natural Language ProcessingSchool of Information Studies
Syracuse University
Why Cross-Language Systems Matter
• There are approximately 4,500 living languages
• 32 million Americans switch from English to another language when they get home from work (U.S. Census 1990)
• Internationally, some who wish to do harm to the US communicate in other languages
• There are too few intel analysts who know the languages of interest
Internet Language Statistics
http://global-reach.biz/globstats/
Internet Language Statistics (2)
http://www.glreach.com/globstats/evol.html
How Cross-Language Retrieval Works
• User who speaks just one language asks their question of the system in that language
• Cross-language retrieval system: • Will have indexed documents (e.g. foreign reports,
emails, message traffic) written in other languages• Translates user query into language of the documents• Matches translated query against document index• Produces a ranked list of relevant documents that are
automatically translated into user’s language• User then reads documents in their own
language• User can now make more fully informed
decisions
SU’s Cross-Language Retrieval Research
• Have produced systems for French, Spanish, Japanese
– DARPA, Intel, & corporate funding of $3.5 million • Currently working in Dutch and Chinese
– 2nd demo is of cross-language English-Chinese on a patent database from China
• Today’s funding announcement will enable us to specialize our current cross-language retrieval capabilities for Arabic
• Future work on information extraction and visualization in Arabic is of keen interest
1. LIVIA – English/English IR System
• Accepts users’ natural language expressions of complex information needs
• Provides precise retrieval against government compiled documents about terrorist activities
• Core technology funded by DARPA and Syracuse Research Corporation
• Demo’d by Ozgur Yilmazel
2. English-Chinese Retrieval Demo
• Cross – Language Retrieval of English queries against a Chinese patent database– Development funded by Unilever Corp, a
multinational corporation which owns 140 companies in more than 100 countries
• Jiang Ping Chen– PhD student in School of Information Studies
Look to the Future
• Incorporate the next level of sophistication in Information Exploitation into Arabic
• Here seen in English– Adds Information Extraction as next step to
Information Retrieval
• Seek your ongoing support for its extension into Arabic
Thank You!
Questions?
Care to try a query on LIVIA!