Global Software:Why You Should Care
Richard GillamIBM Unicode Technology Center
1027113036, Richard Gillam 2
What Is Internationalization?
• Internationalization is the process ofdeveloping a piece of software from theground up so that it can be translatedfor a new user community withoutrecompilation
• An internationalized application:– Displays messages, data, text, etc. according
to local conventions– Doesn’t need recompilation to run in a
new location
1027113036, Richard Gillam 3
The Importance ofInternationalization
• INTERNATIONALIZATION IS NOT A FEATURE!– People expect your product to “just work”– Many users cannot or will not use a program that
doesn’t interact with them in their native language– A program is useless to a user if it can’t handle the
data he needs to process (which may also be in hisnative language)
1027113036, Richard Gillam 4
China
Asia/Pacific
India
Africa
LatinAmerica
North America
EU
Middle East Japan
Europe
World Population Distribution
1027113036, Richard Gillam 5
China
Asia/ Pacific
L. America
NorthAmerica
EuropeanUnion
Japan
Europe
GDP By Region
1027113036, Richard Gillam 6
EnglishGermanJapaneseFrenchSpanishChineseItalianDutchPortugueseKorean
761172
8968585239292322
FinnishRussianSwedishCzechPolishCroatianDanishHebrewIndonesianNorwegian
7652211111
The Pro grammin g Community
• 1,340 books on Java™ technology havebeen published:
1027113036, Richard Gillam 7
1 30 ,0 00
1 ,5 00 ,0 00
5 ,0 00 ,0 00
1 99 8 1 99 9 2 00 0
Internet Use in India
1027113036, Richard Gillam 8
It Pays to Think Ahead
• If you produce computer software, twothirds of your potential market is outsidethe English-speaking world
• If you produce something else, the marketmay be even bigger, which will affect youas you move to eBusiness
• If you have a Web site, the number ofhits you get from foreign countrieswill increase
1027113036, Richard Gillam 9
A Stitch in Time Saves Nine
• Retrofitting an existing application to beinternationalized can be extremely painful
• The Java platform is internationalized
• Java APIs supply extensive facilitiesto help programmers internationalizetheir programs
• Advantages of using built-in facilities:– Far less development effort than ad-hoc solutions– Inherit new languages and features for free
1027113036, Richard Gillam 10
Internationalizin g Your Pro gram
• Separate program code from user interface– Avoid hard-coded character strings in program code
(unless you're sure the strings aren’t user-visible)– Allow for customization of icons and other
pictorial elements– Avoid making assumptions about window layout
1027113036, Richard Gillam 11
Heute ist Freitag, 2. April 1999
Translatable Messa ges
1027113036, Richard Gillam 12
Today is Friday, April 2, 1999
Translatable Messa ges
1027113036, Richard Gillam 13
Heute ist Freitag, 2. April 1999
Translatable Messa ges
1027113036, Richard Gillam 14
Today is Freitag, 2. April 1999
Translatable Messa ges
1027113036, Richard Gillam 15
Today is Freitag, 2. April 1999
static text
Translatable Messa ges
1027113036, Richard Gillam 16
Today is Freitag, 2. April 1999.
dynamic text
Translatable Messa ges
1027113036, Richard Gillam 17
[DEMO]
The Problem ofInternationalization
1027113036, Richard Gillam 18
Watch Out forHidden Assumptions
• In internal processing:– Date and time arithmetic– String comparison– Case mapping– Character-property tests
• When manipulating text:– Counting and indexing characters– What's a “word”?– Not always 1-1 mapping:
character, code point, glyph, keystroke
1027113036, Richard Gillam 19
To Sum Up…
• An internationalized application:– Displays messages, data and text according
to local conventions– Doesn’t need recompilation to run in a
new location– Allows you to enter global markets
• Java APIs supply internationalizationfacilities
• Advantages of using built-in facilities:– Less development effort than ad-hoc solutions– Inherit new languages and features for free
Internationalizationin the Java 2™ Platform
John RaleyIBM Unicode Technology Center
1027113036, Richard Gillam 21
Internationalizationin the Java 2 Platform
• Language-sensitive text analysis– Acquiring localized messages– Formatting numbers, dates, and messages– Text boundary location– String comparison– Calendar
• Complex text– Text display– User interaction with complex scripts (e.g. Arabic)– Paragraph layout
1027113036, Richard Gillam 22
Localized Resources
• ResourceBundle stores messages whichvary across languages
• Appropriate ResourceBundle for theuser’s language will be loaded
ResourceBundle b = ResourceBundle.
getBundle(“mypkg.UI”);
new Button(b.getString(“OK”));
1027113036, Richard Gillam 23
Data Formattin g
• Format classes conform to local languageand conventions
• Dates–
• Numbers– 1234: “ 1,234” or “ 1.234”
• Currency– 9700: “ $9,700.00” or “ L. 9.700”
1027113036, Richard Gillam 24
Message Formattin g
• Order of message parameters can varyacross languages
• Don’t form messages with concatenation
• Use MessageFormat instead
1027113036, Richard Gillam 25
Text Boundar y Anal ysis
• BreakIterator finds boundaries such asword, line, and sentence breaks
• Accounts for language differences
• Disambiguates punctuation
Well, I think $1,234.56 is the amount.
1027113036, Richard Gillam 26
Strin g Comparison
• String comparison is language-dependent– Traditional Spanish: c < ch < d– Traditional German: ö => oe
• Collator class compares correctly
Collator col = Collator.getInstance();
if (col.compare(str1, str2) < 0) {
// less than
}
1027113036, Richard Gillam 27
Calendar
• Calendars vary across countries
• Abstract Calendar class enablescalculations on dates and times
Calendar cal = Calendar.getInstance();
cal.add(cal.MONTH, 1);
Date monthFromNow = cal.getTime();
1027113036, Richard Gillam 28
Complex Text
• Most scripts are more complicated thanLatin (English)
• Some scripts:– Are not strictly left to right– Have multiple sizes / shapes for a single character– Require complex positioning of characters
• The Java 2 platform adds support forcomplex text– Display– Interaction
1027113036, Richard Gillam 29
Bidirectional Text
• Text is stored in reading order, not basedon appearance
• Arabic and Hebrew read right to leftmemory
reading order
1027113036, Richard Gillam 30
Character Shapin g
• Arabic character shapes change toconnect adjacent characters
1027113036, Richard Gillam 31
Ligatures
• Arabic and Devanagari represent somecharacter sequences with ligatures
1027113036, Richard Gillam 32
Character Positionin g
• Thai (and other scripts) require charactersto reposition
1027113036, Richard Gillam 33
User Interaction
• More than just drawing lines of text
• User Interaction– Mapping from graphical point into text (hit-testing)– Assigning positions and sizes to characters
• Line Break: displaying paragraph aslines within an area
1027113036, Richard Gillam 34
Workin g with Complex Text
• When working with complex text, cannot:– Assume a uniform text direction– Measure or draw one character at a time– Rely on default character positions
• Instead, use the Complex Text capabilitiesin the Java 2 platform
1027113036, Richard Gillam 35
Graphical Text inthe Java 2 Platform
• Static text– Swing JLabel class– Graphics.drawString()
• Text editing components– Swing JTextComponent subclasses
• Supporting user interaction with text– Complex Text APIs
1027113036, Richard Gillam 36
Complex Text APIs
• In the java.awt.font package
• Part of Java 2D™ API
• Styled text
• Line operations:– Hit-testing– Caret– Selection
• Paragraph break
1027113036, Richard Gillam 37
Styled Text
• Styled text is accessed withjava.text.AttributedCharacterIterator
• Attributes are key-value pairs in a Map
• TextAttribute has keys
• All Complex Text APIs can use styled text
1027113036, Richard Gillam 38
TextLa yout
• TextLayout represents one line orsegment of text
• TextLayout provides– Drawing– Hit-testing– Caret drawing and movement– Selection
1027113036, Richard Gillam 39
TextLa yout: Hit-testin g
• Hit-testing is mapping from a graphicalpoint to a text location
• Used when responding to mouse clicks
1027113036, Richard Gillam 40
081616 20 26
TextLa yout: Caret Displa y
• Caret shows position between characters
• Can have “dual” carets in bidi text
• Single-caret only is also supported
1027113036, Richard Gillam 41
Offset 25 Offset 15
TextLa yout: Caret Movement
• Arrow-key response should be “visual”
• No predictable relationship to offset
• TextLayout calculates correct offset forvisual caret movement
1027113036, Richard Gillam 42
081616 20 26
TextLa yout: Selection
• Selection shows range of characters
• In bidi text, single character range mayhave multiple highlight regions
• TextLayout generates selections - clientscan be unaware of discontinuities
1027113036, Richard Gillam 43
LineBreakMeasurer
• LineBreakMeasurer formats paragraphsinto lines of text
• Lines are TextLayout instances
Global Software:Why You Should Care
Laura WernerManager, IBM Unicode Technology Center
1027113036, Richard Gillam 45
IBM’s AdvancedInternational Support
• Complex Text: Hindi and Thai
• “Spellout” NumberFormat
• Rule-based BreakIterator
• StringSearch
• 1.1 Rich Edit Control, with BiDi
• NumberFormat enhancements
• International Calendars
1027113036, Richard Gillam 46
Rule-based BreakIterator
• Where do you break lines of text?
1027113036, Richard Gillam 47
Rule-based BreakIterator
• Where do you break lines of text?
1027113036, Richard Gillam 48
Rule-based BreakIterator
• Where do you break lines of text?
1027113036, Richard Gillam 49
Rule-based BreakIterator
• Where do you break lines of text?
• Problem: BreakIterator uses fixed rule set– Must compromise between Japanese and Chinese– Thai is not supported at all
1027113036, Richard Gillam 50
Rule-based BreakIterator
• Where do you break lines of text?
• Problem: BreakIterator uses fixed rule set
• Solution: RuleBasedBreakIterator– Build customized tables with regular expressions– Supports dictionary-based breaking for Thai
• alphaWorks.ibm.com/tech/rbbi
1027113036, Richard Gillam 51
Unicode Strin g Searchin g
• Multilingual sorting & searching is difficult– Minor variants: “e” vs. “é” vs. “e´”– In traditional German, “ä” = “ae”– In traditional spanish, “ch” is one letter– In German, “ß” = “ss”
1027113036, Richard Gillam 52
Unicode Strin g Searchin g
• Multilingual sorting & searching is difficult
• java.lang.Collator is for sortingCollator c = Collator.getInstance();
c.compare(“foo”, “bar”);
1027113036, Richard Gillam 53
Unicode Strin g Searchin g
• Multilingual sorting & searching is difficult
• java.lang.Collator is for sorting
• CollationElementIterator is for searching– Search through “elements”, not characters– Very low-level interface
1027113036, Richard Gillam 54
Unicode Strin g Searchin g
• Multilingual sorting & searching is difficult
• java.lang.Collator is for sorting
• CollationElementIterator is for searching
• alphaWorks.ibm.com/tech/StringSearch– Makes efficient searching easy
StringSearch s = new StringSearch(“the”,“Now is the time for all...”);
int pos = s.first();
1027113036, Richard Gillam 55
BiDi Edit Controlfor JDK™ 1.1 API
• Bidirectional text (Hebrew and Arabic) wasfirst supported in the Java 2 platform
1027113036, Richard Gillam 56
BiDi Edit Controlfor JDK™ 1.1 API
• Bidirectional text (Hebrew and Arabic) wasfirst supported in the Java 2 platform
• What about JDK 1.1 API users?
1027113036, Richard Gillam 57
BiDi Edit Controlfor JDK™ 1.1 API
• Bidirectional text (Hebrew and Arabic) wasfirst supported in the Java 2 platform
• What about JDK 1.1 API users?
• BiDi edit control
• alphaWorks.ibm.com/tech/bidi
1027113036, Richard Gillam 58
NumberFormat Enhancements
• Space Padding: “$1234.56”“ $34.56”
1027113036, Richard Gillam 59
NumberFormat Enhancements
• Space Padding: “$1234.56”“ $34.56”
• Nickel Rounding: 1.233 —> 1.25 SFr
1027113036, Richard Gillam 60
NumberFormat Enhancements
• Space Padding: “$1234.56”“ $34.56”
• Nickel Rounding: 1.233 —> 1.25 SFr
• Scientific Notation:1234567 —> 1.234E+06
1027113036, Richard Gillam 61
NumberFormat Enhancements
• Space Padding: “$1234.56”“ $34.56”
• Nickel Rounding: 1.233 —> 1.25 SFr
• Scientific Notation:1234567 —> 1.234E+06
• alphaWorks.ibm.com/tech/numberformat
1027113036, Richard Gillam 62
Unicode Normalizer
• Two ways to represent accented letters– Precomposed character: é (00E9)– Combining sequence: e + ´ (0065 0301)
1027113036, Richard Gillam 63
Unicode Normalizer
• Two ways to represent accented letters
• Search engines need a canonical form
1027113036, Richard Gillam 64
Unicode Normalizer
• Two ways to represent accented letters
• Search engines need a canonical form
• Unicode TR #15 specifies two:– Composed é (00E9)– Decomposed e + ´ (0065 0301)
1027113036, Richard Gillam 65
Unicode Normalizer
• Two ways to represent accented letters
• Search engines need a canonical form
• Unicode TR #15 specifies two:– Composed é (00E9)– Decomposed e + ´ (0065 0301)
• alphaWorks.ibm.com/tech/unicodenormalizer
1027113036, Richard Gillam 66
International Calendars
• java.util.Calendar is abstract, but onlyGregorianCalendar is provided
1027113036, Richard Gillam 67
International Calendars
• java.util.Calendar is abstract, but onlyGregorianCalendar is provided
• Traditional calendars still in use– Buddhist (Thai) June 18, 2542– Hebrew Tamuz 4, 5759– Hijri (Islamic) Rabi’l 5, 1420– Japanese Imperial June 18, 11 Heisei
1027113036, Richard Gillam 68
International Calendars
• java.util.Calendar is abstract, but onlyGregorianCalendar is provided
• Traditional calendars still in use– Buddhist (Thai) June 18, 2542– Hebrew Tamuz 4, 5759– Hijri (Islamic) Rabi’l 5, 1420– Japanese Imperial June 18, 11 Heisei
• alphaWorks.ibm.com/tech/calendars
1027113036, Richard Gillam 69
Conclusion
• The Java 2 platform providesinternationalization– Text Analysis– Complex Text display and input
• Avoid “ad-hoc” solutions– Extra development effort– No internationalization
• Use internationalization facilities– Get into global markets– Take advantage of new capabilities
1027113036, Richard Gillam 70
Resources
• IBM’s Java Developer Hubwww.ibm.com/java
• Downloadable Softwarewww.alphaWorks.ibm.com
• Unicode Consortiumwww.unicode.org
• Internationalization Documentationjava.sun.com/products/jdk/1.2/docs/guide/internat
1027113036, Richard Gillam 71