Asif Iqbal Sarkar
Research Programmer
BRAC University
Bangladesh
Bangla Localization of OpenOffice.org
L10n is the process of adapting the text and applications of a
product or service to enable its acceptability for a particular
cultural or linguistic market.
• Numerous Locale details ( Currency, National regulations
and holidays, Cultural sensitivities, Product or service
names etc.)
Localization primarily includes:
• Translating text content, software source code, web sites, or
database content
• Adjusting graphic and visual elements and examples to
make them culturally appropriate
Localization
The term "software localization"
describes the process of altering
software products for marketing to
people who speak languages other
than English.
Software Localization
I18n is planning and implementing products and services
so that they can easily be localized for specific languages
and cultures.
Internationalization include:
• Creating illustrations for documents
• Allowing space in user interfaces
• Creating print or web site graphic images
• Ensuring that the tools and product can support international
character sets for software,
• Ensuring data space so that messages can be translated from
languages with single-byte character codes (English) into languages
requiring multiple-byte character codes (Japanese Kanji).
Internationalization
The whole OpenOffice.org architecture is based on a
layered approach. There are four well-defined layers,
each covering a special area of the functionality.
• System Abstraction Layer
• Infrastructure Layer
• Framework Layer
• Application Layer
OpenOffice.org architecture
The L10N and I18N project contains a framework and tools
for localization (l10n) and internationalization (i18n).
l18N
The i18n framework offers functionality which is needed to
internationalize applications like an office suite (OOo).
• The old i18n framework (i18n) offers only support for western languages
• The new i18n framework support for western, Chinese, Japanese and Korean
(CJK) languages and languages which needs Complex Text Layout like Arabic,
Hebrew, Indic and others.
L10N and I18N Project of OpenOffice.org
• Word, line and sentence break
• Search and replace
• Paragraph numbering
• Transliteration
• Character classification
• Number Formats
• Calendar
• Collation
• Locale data (date/time/number/currency format,
calendar information etc )
l18N Framework Functionality
• Internationalization (i18n) of an application is complete only if any locale
support can be added without changing the application binary.
• Development platforms like win32® and Java® provide i18n APIs to
internationalize applications that will run on windows and Java platforms
only.
• The OpenOffice.org I18n framework provides a rich set of i18n APIs to
internationalize OpenOffice.org applications using the Universal Network
Objects (UNO) component model.
• This i18n framework is platform-independent and can run on any platform
on which OpenOffice.org is supported.
• The new OpenOffice.org i18n framework is Unicode based and offers a
rich set of APIs and functionality.
• The i18n framework allows localization developers to add new locales or
enhance existing locale behavior to meet regional market requirements
without modifying the OpenOffice binary.
l18N Framework
• The L10N-Framework of OpenOffice.org provides a easy
to use environment to introduce new languages to the
system and to support new localization of OpenOffice.org.
• The L10N-Framework is based on the multi platform build
environment of OpenOffice.org and it allows only rebuild
of those targets which are mandatory for localization.
L10N Framework
A second method for native language support.
The l10n module offers several localization tools which
support extraction of strings and context information
out of source code. Also merging back localized strings
is supported.
1. Adding a New Language to the Office Suite
2. Extracting and Merging Strings and Messages
The framework is built over the component model UNO
thus making the addition of new localization
components easy.
L10N Tools for Translation
Localization of OpenOffice.org
Localization of OpenOffice.org involves:
• Assuring that OpenOffice.org can work with your script
in the platform in which you want to use it.
• Assuring that some changes are made in the
OpenOffice.org source, so that the program recognizes
your language as one of the languages it can work in.
Translation of OpenOffice.org to your language.
Translation has different levels.
1. The first one is translating the menus and messages of
the program itself.
2. The second one includes also the translation of the help
pages of OpenOffice.org, not a small task.
3. The third level adds the development of documentation
for OpenOffice.org in your language.
Localization of OpenOffice.org
Steps of translation
• Extracting all strings and messages out of the source
code for translation.
• Translating the source code using well known
localization tools. (localize).
• Merging back the translated strings into the code.
• Rebuilding the localization targets inside the build
environment.
• A Localized installation set will be created automatically.
Localization of OpenOffice.org
Status of Bangla Localization of OO
• Aim is to Develop a Bangla version of the office suit.
• Developed a Bangla Locale file for OpenOffice.org.
• Bn_BD and bn_IN stable locale files.
• Bangla is not in the supported languages list in the
Localization framework project of OpenOffice.org.
OpenOffice.oreg support for Bangla
Bangla Script Support
Bangla Rendering Support
Rendering Problem
Unsynchronized forward and backward
movement of cursor causing problems for
Replacement, deletion operations for
Bangla text processing.No Bangla Script support
In dialogue boxes.
Lingucomponent Project
The Lingucomponent Project provides the writing aid
features: spell checking, hyphenation, and thesaurus.
One of the goals of the Lingucomponent project is to
develop dictionaries and affix files to support spell
checking in different languages.
OpenOffice.org doesn't provide dictionary for Bangla.
MySpell supports only 8-bit encoding. No Unicode Support
for Bangla script.
Ispell’s dictionary and affix files are converted to conform
with MySpell and then the licensed dictionary files are sent
to the project authority who includes the specific language
dictionary in the next release of OpenOffice.org
Lingucomponent Project Spellchecker
Bangla Computing Integration into OpenOffice.org
OpenOffice.org provides flexibility for adding components
to it through component based API called UNO (Universal
Network Object).
• UNO is the base component technology for
OpenOffice.org.
• It is used to write components that interact across
languages, component technologies, computer platforms,
and networks.
• Currently UNO is available on Linux, Solaris, and
Windows for Java, C++ and OpenOffice.org Basic.
UNO Features
• UNO is used to access OpenOffice.org using its Application Programming
Interface (API).
• The OpenOffice.org API is the comprehensive specification that describes the
programmable features of OpenOffice.org.
• It is possible to connect to a local or remote instance of OpenOffice.org from
C++, Java and COM/DCOM using UNO.
• C++ and Java Desktop applications, Java servlets, Java Server Pages, Jscript
and VBScript, and languages, such as Delphi, Visual Basic and many others can
use OpenOffice.org to work with Office documents.
• It is possible to develop UNO Components in C++ or Java that can be
instantiated by the office process and add new capabilities to OpenOffice.org.
For example, Chart Add-ins or Calc Add-ins, linguistic extensions, new file
filters, database drivers and even complete applications, such as a groupware
client.
UNO Features
Remote Connectivity to OpenOffice.org
OpenOffice.org
Server program
SERVICES
Client program
JAVA
Client program
C++
Client program
VB
UNO OpenOffice API
• Open Documents
• Write Components
• Add Components
This is the normalized form of the loaded file...then “����” is searched
and replaced with “����“ and the replaced character is highlighted.
Remote Connectivity to OpenOffice.org
• OCR client program accesses the OpenOffice.org server to use it’s
services. This client-server model approach is platform independent.
• After getting the scanned Bangla document the Bangla OCR
program will simply recognize the characters and send the Unicode
code points of the characters to the OpenOffice.org server with a
request to open a window to display the recognized characters where
an user can edit or modify the document.
• Here a simple Bangla OCR program is used as a sample to
demonstrate the approach that could be extensively used to develop
useful programs or utility components as client programs in a
distributed system by getting the service from OpenOffice.org.
Simple Bangla OCR and OpenOffice Integration
References
OpenOffice Homepage
http://www.openoffice.org
OpenOffice Localization and Internationalization Project
http://l10n.oprnoffice.org
I18n API
http://api.openoffice.org
UNO Home page
http://udk.openoffice.org