Post on 26-Dec-2015
transcript
An IntroductionPart II: Enabling
Internationalization
License
This presentation and its associated materials licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 License. You may use these materials without obtaining permission from the author. Any materials used or redistributed must contain this notice.[Derivative works may be permitted with permission of the author.]This work is copyright © 2008-2011 by Addison P. Phillips
Who is this guy?
• Globalization Architect, Lab126 We make the technology behind the Kindle
• Chair, W3C Internationalization WG
Internationalization is:
• the design and development of a product that is enabled for target audiences that vary in culture, region, or language. [W3C]
• a fundamental architectural approach to software development
Related Concepts
• Localization: creation of a product tailored to a particular target market
• Translation: process of converting text from one language to another
• Globalization: unified approach to creating global products, especially those that support multiple geographies simultaneously
Mystic Numbering (M4C N7G)
Opinions differ on capitalization (C12N);choose from: i18N I18n I18n I18NVery geeky; not very internationalized (I19G?)
I N T E R N A T I O N A L I Z A T I O N
I 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 N
I18N
Localization = L10NGlobalization = G11NCanonicalization = C14NAccessibility = A12Y
The Internationalization Approach
• Gather requirements globally• Enable• Externalize• Customize• Test and support globally• Localize
The Internationalization Approach
• Enabling—the same code supports multiple regions or cultures. Sometimes called a “global binary”.
• Externalization—plan for localizability by separating “content” from code. This makes localization for specific languages, regions, or cultures easy, fast, and cheap.
• Customization—add culturally specific functionality, presentation, or content to an application.
A Global Approach
• Internationalization turns technical problems into business decisions
• Balance priorities based on real user distribution/requirements– Consider global user population as a whole– Consider specific market requirements on an equal
footing– Potential markets for the product
Internationalization Myths
• We (wrote it in Java/C#, used Unicode, etc.), so it is internationalized.
• We made the assumption that the product would only ever have English screens: all our users understand it anyway.
• A localized product is internationalized.• An internationalized product is slow/slower.• It takes longer to write internationalized code.• We can’t read the screens/it is too hard to test.• We have no intention of localizing, so no need
to internationalize.• We don’t have any customers there.• The users in (some country) never complained,
so it must work.• This product is 100% fully internationalized.
We need special experts.We need an extra development cycle.We need six more months to build it.We need people who speak (language).
Internationalization Truths:“Well, it depends…”
• Generalize designs– Locale independent data structures– Locale sensitive display
• Externalize cultural or linguistic variations• Customize as a last resort
Buy In: The Key to Success
• For internationalization to be a success over time, there must be commitment:– Management– Product Team– Development Team
• All developers, not a splinter group
Addressable Market:
Why Do Internationalizatio
n?
Globalized Product Development
Internationalization turns technical problems into business decisions.– Localization: Choose which markets to translate user
interface or documentation for with no engineering.– Deployment : Choose whether to serve applications from
a single site, cluster of sites, or in each target market.– Development : Add content and features to products as
necessary in each target market.– Integration and Interoperability: Servers and products
can work together around the world, so customers can truly create “Enterprise” solutions.
Development Methodologies
Independent of development methodology Agile? Waterfall? You make
the choice. Encompasses the full
development cycle: Design Development QC Release Support
Develop Roadmap
(global deployment
)
Develop Requiremen
ts&
Architecture
Design(internation
alized)
Code(Enable,
externalize,customizabl
e)
Test(non-
English/non-ASCII)
RTM/GA(by market)
Develop Requiremen
ts(all
customers)
The Customization Approach
• Let’s do it in a separate release.• Let’s make a branch for the international
customers.• Let’s get a special team of people to work on
the international release.
How That Model Really Looks
Time
Main Line
sexy new features
bug fixes
1.0 1.0a 2.0
International Branch
Merges and Fixes
Lots more peopleand cost Internation
al Release 1.0
functionality gaps: intl users waiting for 2.0i now
Lost $ and opportunitylots of cost to get there
1.0i
The Problem with Customization
Code forks. (double, triple coding)Lag time for international releases.Non-adoption of localized release.Full regression of every language.Quality or commitment perception.Lack of data exchange between language versions.Difficult to repeat (every version is a repeat)Proliferation of bugs and of support problems. International features are cancelled.Core product still doesn’t work/can’t address similar markets.Loss of market share.
ANALYZING AND DEVELOPING A DESIGN
Large Animal Pictures
dates
numbers
images
colors
addresses
local rules
strings
Your Application
local rules, regulatory requirements, postal addresses, default bookmark lists, your company’s customer service phone numbers
The Problem
The Solution
Locale-independent global binary
Locale-dependent resources
(includes code)
Large Animal Pictures
Software ComponentOutput
Global Code
Reso
urce
s
I/O
Input
Enterprise Animal Pictures
Business Logic
Data Store
Front End
Operating Env.
Your System
API
UnicodeLegacy
Encoding
Detect / Convert
Capture Encoding
Detect / Convert
Unicode Cloud
Unicode Interface
Convert to Legacy
Partner/Content Provider
Internationalization Issues
• Text Processing– Character encodings, including Unicode, spelling, word breaks, collation,
and so on• Language
– Of the software (localization)– Of solutions built using the software (localizability, data)
• Locale-affected formats– dates, numbers and the like
• Regionally-affected formats– names, addresses, currency, and the like
• Time-related issues – time zone, calendar, holidays, work rules and the like
• Cultural adaptation– presentation, style, position, color use, and the like
• Legal requirements– accessibility, SOX, DRM, moderation, security, content, and the like
Levels of Enablement
• Not Enabled• Single-Language-at-a-Time (SLAAT)
All components run in the same language and encoding environment correctly.
• Multi-LocaleUnicode support; components run in different locales, languages, encodings, and time zones
Test Your Assumptions
Gender: Male× Female
Choose Your Language
How is this company doing?
ENABLINGMaking Code Aware of Culture
What is “enabling”?
• Enabled software:adapts the display, processing, validation, storage, and transmission of data according to the cultural, linguistic, and regional needs of the users
– Text, Characters, and Encodings– Locale Awareness– Times and Time Zones
A “global binary” is a single object-code version that is used in all markets, regardless of localization.
Don’t Code What You Think You Know
5/2/7 sometime in February? sometime in May?sometime in 2005?
1.234 more than 1000?less than 2?
4.32.MD number, time, currency?morning or afternoon?
Date Formats
Culture Format Example
U. S. A. mdy, / 2/16/05
France dmy, . 16.2.05
France dmy, - 16-2-05
CJKT ymd, / 2005/2/16
CJKT ymd, 年月日 2005 年 2 月 16日
Japan e¥md, 平成 17 年 2 月16 日
Japan ¥md, / 17/2/16
Time Formats• U.S.A.: 4:00 p.m.• France: 16.00• Japan: 1600• Japan: ごご4:00• Korea: 오후 4:32• Thai: 16:32 น.• Albanian: 4.32.MD• Arabic: م 04:32
More Examples
Assumptions about date tokens:USA: Sun, Mon, Tue 3 positions, titlecaseFrench: lun. mar. mer. four positions
lowercaseRussian: Пн Вв Ср two positions, Cyrillic USA: Jan, Feb, Mar 3 positions, titlecaseFrench: janv. févr. mars avr. variable (4 or 5)
positions, lowercaseSpanish (Spain): ene, feb, mar not titlecaseSpanish (Americas): Ene, Feb, Mar titlecase
Calendars: What Year Is It?
• Legal, ceremonial, or popular requirementGregorian 2012Japan Emperor: 24 Heisei ( 平成 24 年
)Thailand (Buddhist): 2555 (Gregorian + 543)
Chinese (traditional): 4704 (lunar)
Hebrew (lunar)תשסו 5767
Hijri (Islamic) 1428 (lunar)
Armenian 1461 ( )ԹՎ ՌՆԾԶ
etc. etc. etc.
Weekends and Holidays
• When is the weekend?– Friday is part of the weekend in some countries.
• Both official and unofficial holidays vary widely in number. Here are a few to watch for:– USA: July 4, MLK, President’s Day, Veteran’s Day, Flag Day,
Columbus Day, Thanksgiving…– Japan: Golden Week– China: New Year’s– Britain: Guy Fawke’s Day, Boxing Day– France: Bastille Day– Spain: Reyes Magos
Calendar Display
Numbers
Grouping and decimal separators:England: 12,345.67Germany: 12.345,67Switzerland: 12’345,67Swiss money: 12’345.67France: 12 345,67India: 12,34,567.89
France uses a non-breaking space!India: number of digits in groupings changes!
ListsList delimiters & separators can conflictFrench example:
2 345,67, 1 012,34, 45,67 hard to read
2 345,67 ; 1 012,34 ; 45,67 easier to read
List myNumberList = getList();NumberFormat nf = NumberFormat.getInstance();StringBuffer buf = new StringBuffer();Iterator iter = myNumberList.listIterator();while (iter.hasNext()) { buf.append(nf.format(((Number)iter.next()).doubleValue()); buf.append(“, “);}System.out.println(buf.toString());
List myNumberList = getList();NumberFormat nf = NumberFormat.getInstance();StringBuffer buf = new StringBuffer();Iterator iter = myNumberList.listIterator();while (iter.hasNext()) { buf.append(nf.format(((Number)iter.next()).doubleValue()); buf.append(“, “);}System.out.println(buf.toString());
Collation ( A F A N C Y W O R D F O R “ S O R T I N G ” )
English: ABC...RSTUVWXYZGerman:AÄB...NOÖ...SßTUÜV…YZSwedish/Finnish: AB...STUVWXYZÅÄÖNorwegian: AB...VWXYÜZÆØÅ
Organizing Information
• “Alphabet” differences• Additional information
– for example: yomi• ASCII vs. the world• Mixed information sets
“Should I be writing all of this down…”
• Wide range of variation
• Obscure formats• Difficult to obtain
reliable information on formats
• Lots of work to implement and maintain
Enabling means not having to know (m)any of the details
Supporting International Formats
• Use neutral data structures– Makes code
independent of locale– Most data types are
locale-neutral:• Boolean• String, char• Number classes• Date, Calendar
• Encapsulate formatting/validation in a function– Format style chosen
dynamically at runtime– Format details don’t
have to be specified or researched
– APIs know the gory details
Essence of Enabling
• Object to Presentation, Presentation to Object– Integers– Floats– Percents– Currencies– Dates– Times– Durations– Collation (lists)– Weights/measures/sizes– Resources (user interface strings)
Locale user presentati
on
Locale
• an identifier or data structure that allows programmers to access culturally and linguistically affected functionality in a system.
• Many systems now based on IETF BCP 47; for example JavaScript, Java 7, and CLDR
Complex Types
• Data structures, APIs, or classes built from basic types must include similar capabilities.– Store data in a locale-neutral or independent format.– Display in a language/regional/culturally sensitive manner– Convert from locale format to locale-neutral or locale-independent
storage format.
Design Time and Data Structures
• Identify your own “locale bias”– Field names matter!
• “Postal Code”, not “ZIP code”.• Family Name/Given Name, not First Name/Last Name
– Avoid problematic fields• Postal address parsing? Area code? Etc.
Currency
• Currency formatting is usually similar to number formatting. But things can vary widely here, too:– $1,100.00 [USA]– €1 100,00 [France-Euro]– ¥1,100 [Japan]– 1.100$00 Esc. [Portugal,
obsolete]– SFr. 1’000.00 [Switzerland]
• Currency associated with the locale doesn’t always apply. Store the currency type with value.– Use ISO 4217 std. codes (USD,
JPY, EUR, RUR)• Not always one symbol.• Not always two decimal places.• $100 + ¥100 = $101
• Consider neutral displays!
Being Locale Neutral
• Avoid or reduce locale-affected display to increase portability– Use unambiguous formats, such as ISO 8601-like
dates, especially in log files and the like• 2005-04-01 14:17:00 UTC
– Use consistent formats (‘user locale’), especially in columns or collections of data
Amount Currency351,234.56 USD102,556.78 EUR65,336.00 JPY
212,345.00 INR
Amount Currency351,234.56 USD102 556,78 EUR
65336 JPY2,12,345.00 INR
“The String is the Thing”
• Text doesn’t get translated on the fly.• Don’t use text as an identifier or foreign key.
– Use ID Numbers or not-human-readable values instead of requiring text fields to match.
– “Intrinsic” data value versus “display” data value.• Enumerated values displayed as strings.• Use display strings.
Displayed“Accounts Payable”
“pagável de clientes”
Enumerated
ACCOUNTS_PAYABLE
English-like Construction
• Concatenation– string1 + string2
• Pluralization– Dog + “s” = “dogs”
This topic will be covered in greater depth in the section on localization.
Databases
• Most databases can only handle one collation sequence per instance or one collation per index.– Remove reliance on alphalists.– Self-collate short lists.– Pre-collate long lists?
• Example: NLS_SORT controls the way Oracle returns data (collation sequence).– Global environment variable.– Not necessarily under your control.– Indices are built on a predetermined or binary sort.
Enabling Summary
• Understand Encodings and Unicode– All text has an encoding!
• Be Locale-Aware– Create locale-neutral data structures– Separate display from storage
IT’S ABOUT TIME
Dates, Times, Durations, Calendars a little aside…
Observed Time
Incremental Time
• Computed time based on “clock ticks” in an “epoch”– The epochal date is arbitrary. The UNIX epoch is
midnight, January 1, 1970, UTC.
Field Based Time
• Time based on calendric fields (day, month, year, hour, minute, second)
• Some systems have data types for “field based” time also.
What is a Time Zone
• A time zone is a geographical region or area that has common rules for determining the local observed time as it relates to monotonic (computer) time.
• Distinctions include:– Offset from UTC– Daylight Savings (Summer Time) behavior– Historic changes in offset or DST behavior– Political control
Durations and Repeating Events
Wall-time: this meeting is at 2 PM Pacific time every Tuesday
– interval between meetings may vary in number of seconds
• Daylight time transitions• Changes in DST rules
Fixed-duration: run the virus scanner every 57 minutes
– interval is always 342000 milliseconds
Time Zone Affected Scenarios
• Zone independent– only “incremental” times
are necessary• Local time, past only
– future changes to time zone rules not applicable
– example: logging system• Local time, both past and
future– time zone rule changes
may affect some time values
– example: calendar program
• Floating times– events not tied to a specific
time zone– example: birthdate, start date,
definition of “night” for phone usage
• Recurring events– events that recur—sometimes
during and sometimes not during daylight savings.
– example: weekly status meeting
Time Zone Scenarios
• Zone Independent — generally timestamps that don’t refer to a specific time zone.– Record local offset or (better) use UTC– May want wall time for analysis
Time Zone Scenarios
• Local Time (Past Only)—times that cannot change their relationship to DST– Store zone ID and time value
[may store offset instead of zone ID]
• Local Time (Past+Future) — time values may need to change if DST rules change– Store original offset along
with zone ID and time value– May require a database crawl
if DST rules change
Time Zone Scenarios
• Floating Times — times that don’t change regardless of where you are in the world.– Publication dates– Birth dates (or any anniversary date)– Etc.
• Handle using UTC andavoiding zone casting
Time Zone Scenarios
• Recurring Events — time values that occur in both DST and non-DST time– Store time, recurrence period, zone ID, original
offset, and whether to tie recurrence to DST
OffsetEtc/UTCEtc/GMT+1
Time Zone Identifiers
• Often based on the IANA time zone database (tzinfo) [formerly “Olson IDs”]
Continent/CityAmerica/Los_AngelesEurope/ParisAsia/TokyoAntarctica/DumontD
Urville
Ocean/Island(City)
Atlantic/CanaryPacific/AucklandPacific/Pago_Pago
Continent/Region/City
America/Indiana/ Indianapolis
Time Zone Hints
• Only 21 countries have more than one time zone (if you know the country, you often know the time zone)
• Argentina, Australia, Brazil, Canada, Chile, Democratic Republic of the Congo, Ecuador, France, Greenland, Indonesia, Kazakhstan, Kiribati, Mexico, Micronesia, Mongolia, New Zealand, Portugal, Russia, Spain, and the United States.
– Of these, most have maritime or overseas regions. Examples:
• Ecuador: Galapagos Islands• Chile: Easter Island• Portugal: Azores
Locale-Neutral Formats
• Use locale-neutral formats for interchange:– ISO 8601– Incremental time values (e.g. time_t)– Distinguish time zone if necessary for
interpretation• Offset is not the same as time zone
At any given time, in UTC, it is the same time everywhere that time is measured.
SQL data types and XML formats are often field-based, while programming languages are usually incremental.
Formatting Dates and Times
Requires more than just a locale!datetime zonecalendar
value being formatted
defines relation to “wall time”
defines rules for calculating field
values
1034197545321L
Asia/Tokyo
Japanese Imperial
October 10, 14H 6:05:45 AM JST
Externalization
Making software localizable
What is localization?
“What is localization?” Zula asked. Peter sighed, letting her know it was a stupid question. “Translating foreign software into Hungarian, making things work correctly in the special environment of Hungary,” Csongor explained, and Zula thought that she could glimpse, here, in the way that he contentedly explained things, Csongor’s father the school-teacher.
Reamde by Neil Stephenson
What is Localization?
• The process of tailoring a product to a specific target market.– Translation of messages– Adaptation to local preferences– Addition (or subtraction) of content or features
Localization is Obvious
… but it isn’t “internationalization”• Localizability is internationalization.
– Externalize text– Externalize presentation– Dynamic composition– Distribution of language content– “Plug-in” features
What is a ‘Resource’?
any application component loaded dynamically at runtime, rather than compiled into the application
In localization: source code files containing language, region, or culturally-affected materials
– Text– Error messages – Icons– Pictures– Fonts– Colors– Graphics– Sizes– Positions– Magic Numbers– Mnemonics (“Alt+G”,
“F4”, etc.)– File Locations– Dictionaries– Glossaries– Grammar Rules– Code
Why Resources?
TextError messages IconsPicturesFontsColorsGraphicsSizesPositionsMagic NumbersMnemonicsDictionariesGlossariesGrammar RulesCulturally specific code
Before After
Avoiding Forks
Global Binary
Resources
English Version
ResourcesResources
ResourcesLanguage +1 Version
Forked Code Woes
• Hard to fix and maintain• Different versions in the field• Delays in releasing localized product• Different functionality by region• Confusing for customers/users• Versions are not interoperable and might not
be able to exchange data!
More Benefits
• Rename or re-brand product• Fix spelling or grammar mistakes• Fix usability• Make terminology consistent• Test drive new customer experiences, try new
designs, etc.
… all without a rebuild!
"Project-Id-Version: blanket 1.0\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2011-03-23 15:43-0700\n" "PO-Revision-Date: 2011-03-23 15:43-0700\n" "Last-Translator: Richard Gillam <gillam (a] lab126.com>\n" "Language-Team: en <kindle-i18n-team (a] lab126.com>\n" "MIME-Version: 1.0\n" 20 "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" # font msgid “my.font.name" msgstr "dialog" #: progress bar in point based msgid "progress_bar.rect" msgstr "43.11,64.67,172.45,12.93"
msgid "progress_bar.border" msgstr "2"
# bounding box: x_pos,y_pos,width,height msgid "shutdown.cust_service.header.rect" msgstr "0,14.65,258.68,12.07"
msgid "shutdown.cust_service.header" msgstr "Repair Needed"
What’s wrong here?
String1 = There areString2 = noString 3 = tables inString4 = filesString5 = .
Messages I Could Build:There are files.There are no files. There are 50 files. There are tables in files. There are no tables in files.There are 50 tables in no files.There are tables in.
Let’s Google Translate That!Messages I Could Build:
There are files.There are no files. There are 50 files. There are tables in files. There are no tables in files.There are 50 tables in no files.There are tables in.
Il ya des fichiers.Il n'y a pas les fichiers.Il ya 50 fichiers.Il ya des tables dans des fichiers.Il n'ya pas de tables dans des fichiers.Il ya 50 tableaux dans aucun fichier.Il ya des tables po.
Don’t Build Text From Fragments
• Text fragments are hard to translate– Fragments may not follow grammar rules– Cannot know which parts go together– Parts can be reused in incompatible ways
• Internationalization APIs offer “patterns” to fix:
[] files out of [] were deleted.
An error occurred at [] on [].
Page [] of []Processing: []%
complete.
Example: MessageFormat (Java)
There were {0} tables on {1}.
There were {0,number,integer} tables on {1,date,short}.
{1,date} に {0,number,integer} のテーブルがあった。
• Number replacement variables.• Provide typing and formatting information where possible.• Externalize as a single unitary string.
What’s My Gender
“Documenti del Chris“"Documenti della Chris”
"Documenti - Chris"
More Issues With Text Composition
– There were one errors found.– You have earned your 22th set of bonus points.
Sentence Parts Must Agree
• Endings, Gender, Plurality, Case– e.g. Japanese counting uses different words for
different kinds of objects– e.g. Slavic languages use different endings for
singular, few, many…
Complex Message Formatting
There were no errors.There was 1 error.There were 2 errors.
“choice format” APIs allow for different resources to be used based on runtime values.
0:There were no errors.1:There was {0} error.2:There were {0} errors.
0:не было ошибок 1:была {0} ошибка 2:были {0} ошибки 5:были {0} ошибок
The number of resources may need to vary by locale or language
Examples: ordinal numbers (1st,
2nd, 3rd, 4th, etc.) complex messages,
such as “27 seconds ago” vs. “10 minutes ago”
Images and Icons• Avoid metaphors• Avoid cultural sensitivities• Avoid body parts• Replace as necessary
• Avoid putting text into graphics
Graphic: $20Text: $0.06
Images and Culture
• Beware your biases—even “good” ones.
Meet your friends on our new social website
for India
Isn’t it Swell?
English is very succinct.– Words in other languages
are longer– Sentences are longer– Characters may be larger
More Swollen Text
• 30% in length (alphabetics, abjads, etc.)• 30% in height (ideographics)• But… a rule of thumb, not a “fact”
– Measure your results with care.
A Cautionary Tale
GUI Layout
Dereferencing
• Minimize sentence building• Minimize arguments per string• Use subject:predicate wherever possible
When you can do this:Balance: $100.00
Don’t do this:Your balance is $100.00.
Dynamic vs. Static Layout
• Magic numbers• Externalized layouts• Mnemonics• Colors
Localizing Styles
• Bolding is not universal for emphasis– Italicization, Capitalization, etc. are also not
universal (some scripts don’t have these attributes)
• Use Logical not Presentational names– Describe the function not the appearance. For
example, use “emphasis” instead of “italics”.
中国 Amikake Wakiten
Use of Color
“Going Down”
“Going Up”
Non-Translatable Resources
• Some content should be externalized but not translated– Sometimes referred to as “DNT” for “do not translate”
• Externalize? Yes…– Segregate DNT material from translated material if possible (by
using separate resource files or separate resource blocks within a file).
– Developers can’t always tell when something should or should not be DNT… and neither can translators (context is missing)
The “Locale” in “Localization”
• Resources “fall back” to find the best match Global Binary
Resources
zh-Hans-SG (Chinese, Simplified script, Singapore)
zh-Hans (Chinese, Simplified script)
zh (Chinese)
(root)
Fallin
g b
ack
Sparse Population
• A given language resource may not contain a complete set of resources.– Some resource language fall back for each sub-
resource (such as a particular value)
“appName” “Demo”“maxRows” 57“dialogTitle” “Hello World”
“appName” “Démo”
“dialogTitle” “Bonjour monde”
Getting the Right Locale
Business Logic
Data Store
Front End
Operating Env.
client
Client Locale
Server Locale
API Request Locale
System Mgmt Locale
One request might serve multiple purposes or be seen in multiple contexts
Resources and Translation
“key”, “display string”“dialogTitle”, “Dialog Title”“aMessage”, “This is a message.”
Pseudo-Translation
“key”, “ðìsplàÿ stríñg”“dialogTitle”, “Ðîálòg Tïtl蔓aMessage”, “Thìß ís â M
ésßãgê.
Pseudotranslation
Keyboards
Input Method EditorsSome languages require software to assemble keystrokes into characters
Asian languages with vary large character sets Complex scripts with vowel-killers and other
contextual editing requirements
Applications that interact directly with key-pressed events can disable or disrupt IME input.
On- and over-the-spot editing
Customization
When is it okay?
• Content should be highly localized or have locale-specific requirements: – customization lets you address
this requirement in the most localized possible manner
dates
numbers
images
colors
addresses
local rules
etc.
Externalization again
Your Application
local rules, regulatory requirements, postal addresses, default bookmark lists, your company’s customer service phone numbers
Externalization again
Locale-independent global binary
Locale-dependent resources
(includes code)
Large Animal Pictures
Software ComponentOutput
Global Code
Reso
urce
s
I/O
Input
Code can be a resource!
Customization Examples
Postal address validation
Postal code validationTelephone number
formatter“Personality” questions
blood type vs. sun sign
Personal name formatterfirst/last position, space,
highlighting, formality, etc.Tax codes and shipping
schedules
Generic API
Generic Implementation
USImplementation
DE Implementation
?? Implementation
Example: Postal Addresses
address1 varchar(32)
address2 varchar(32)
city varchar(16)
state char(2)
zip char(5)
country char(2)
address1 varchar(64)
address2 varchar(64)
city varchar(64)
province varchar(64)
postcode varchar(64)
i18n
country=US, postcode=‘WC2 1GH’ // error
country=UK, postcode=‘95111’ // error
country=DE, postcode=‘1A4 喪’ // okay?
public interface Address {
public class USAddress extends genericAddress {
public class UKAddress extends genericAddress {
public class genericAddress implements Address {
Building Global Software
Beyond Just Coding: Localization, QA, and all that
The Internationalization Cycle
• Encompasses the full development cycle:– Requirements– Design– Development– QC– Release– Support
Develop Roadmap
(where is the product going?)
Develop Requirement
s&
Architecture
Design(internationa
lized)
Code(Enable,
externalize,modularize)
Test(non-
English/non-ASCII)
RTM/GA(by market)
Support Issues
and Requests(all
customers)
What is “internationalization QA”?
• Does the enabled product work correctly?– Non-English configurations– Non-ASCII data and encoding support– Cross time zone support– Market specific features or customizations
• Does localization appear correctly?– Is the product localizable?
What makes this different from “regular” QA?
Growing (and Pruning) the Matrix
Include non-English configurations in your test matrix; include non-ASCII data in your tests.
Be prepared to prune the test matrix.
What to Test With
– Test Non-English configurations• Non-English locales (lying to your machine)• Native configurations (when does it make sense?)
– Test Non-ASCII data• Encodings, encodings, everywhere• Non-ASCII character values
– Test Across Time Zones• Two or more time zones; consider international date
line (“it’s tomorrow in Japan”) and DST issues
Planning Testing
Initially• Get tools that are
enabled!– Automation allows
greater coverage, but only if it works.
• Plan encodings and locales as part of the test matrix.
• Acquire third-party products as necessary.
Increasing Maturity
• Use test driven development practices.
• Get developers to write unit tests that are internationalized.
• Put the ‘i18n’ bugs into the regression suite.
Configuring Machines
Create both native and simulated environments:– Native operating systems may have minor but
sometimes critical differences (folder names, keywords, localized registry entries)
– Most features don’t run into native differences (easier to work with English-localized machines)
– Don’t buy physical keyboards (use software keyboards) unless your application relies on scan codes from keys
Localization
Incorporate
Localization is part of the release process too.– Changes to the user interface cost the localization
team time and money.– (Changes to the product cost the documentation
and QA folks too)• May need to institute change control or a UI
freeze
Simultaneous Shipment (Simship)
Ideally, to maximize opportunity, ship the target languages the same day as the source language.
– It might not make sense for your product.– But it might not be as difficult as you think it is. It
might even be good for you.
Distribution of Content
• How does the localized text get into the running product?– Satellite assemblies, DLLs, shared libraries– Message catalogs– Special directory– Database– Etc.
More Distribution
• “Specific Language” (per-language)
• “Language Included” (one or more languages)
• “Language Pack” (product plus something)
English
German
French
English
German
French
English
German
French
Global Binary+
Completing the Product
• Static content is often under source control and can be localized “normally”
• Dynamic content may include the initial set of data or other items which need to be localized beyond software.– Demos and Demo Data– Dictionary, Language add-ons– Local offers, links to Web store, etc.– Packaging– Regulatory
Quality Checking and Development Methodologies
• Translation is a human-oriented task. – Translation time lines are linear
with volume.• Localized product should be
tested for functionality– translation can break things– usually the first language finds
most of the bugs• Translations should be checked
for quality
• Development cycle has to include time for translators and quality assurance to catch up.– This does not mean “no agile”
or “no changes”– Do pilot language(s) or moving-
target translation; do better UI design and usability reviews; etc.
Summary
Internationalization
… is a fundamental architectural approach: it is how software is built.– Design– Enabling– Externalization– Customization– Testing and Support– Lifecycle
Q&A
Would you write the code for I18N on the whiteboard before you go?
#define UNICODE#import I18N.h