An Introduction Part II: Enabling Internationalization.

Post on 26-Dec-2015

217 views 2 download

transcript

An IntroductionPart II: Enabling

Internationalization

License

This presentation and its associated materials licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.5 License. You may use these materials without obtaining permission from the author. Any materials used or redistributed must contain this notice.[Derivative works may be permitted with permission of the author.]This work is copyright © 2008-2011 by Addison P. Phillips

Who is this guy?

• Globalization Architect, Lab126 We make the technology behind the Kindle

• Chair, W3C Internationalization WG

Internationalization is:

• the design and development of a product that is enabled for target audiences that vary in culture, region, or language. [W3C]

• a fundamental architectural approach to software development

Related Concepts

• Localization: creation of a product tailored to a particular target market

• Translation: process of converting text from one language to another

• Globalization: unified approach to creating global products, especially those that support multiple geographies simultaneously

Mystic Numbering (M4C N7G)

Opinions differ on capitalization (C12N);choose from: i18N I18n I18n I18NVery geeky; not very internationalized (I19G?)

I N T E R N A T I O N A L I Z A T I O N

I 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 N

I18N

Localization = L10NGlobalization = G11NCanonicalization = C14NAccessibility = A12Y

The Internationalization Approach

• Gather requirements globally• Enable• Externalize• Customize• Test and support globally• Localize

The Internationalization Approach

• Enabling—the same code supports multiple regions or cultures. Sometimes called a “global binary”.

• Externalization—plan for localizability by separating “content” from code. This makes localization for specific languages, regions, or cultures easy, fast, and cheap.

• Customization—add culturally specific functionality, presentation, or content to an application.

A Global Approach

• Internationalization turns technical problems into business decisions

• Balance priorities based on real user distribution/requirements– Consider global user population as a whole– Consider specific market requirements on an equal

footing– Potential markets for the product

Internationalization Myths

• We (wrote it in Java/C#, used Unicode, etc.), so it is internationalized.

• We made the assumption that the product would only ever have English screens: all our users understand it anyway.

• A localized product is internationalized.• An internationalized product is slow/slower.• It takes longer to write internationalized code.• We can’t read the screens/it is too hard to test.• We have no intention of localizing, so no need

to internationalize.• We don’t have any customers there.• The users in (some country) never complained,

so it must work.• This product is 100% fully internationalized.

We need special experts.We need an extra development cycle.We need six more months to build it.We need people who speak (language).

Internationalization Truths:“Well, it depends…”

• Generalize designs– Locale independent data structures– Locale sensitive display

• Externalize cultural or linguistic variations• Customize as a last resort

Buy In: The Key to Success

• For internationalization to be a success over time, there must be commitment:– Management– Product Team– Development Team

• All developers, not a splinter group

Addressable Market:

Why Do Internationalizatio

n?

Globalized Product Development

Internationalization turns technical problems into business decisions.– Localization: Choose which markets to translate user

interface or documentation for with no engineering.– Deployment : Choose whether to serve applications from

a single site, cluster of sites, or in each target market.– Development : Add content and features to products as

necessary in each target market.– Integration and Interoperability: Servers and products

can work together around the world, so customers can truly create “Enterprise” solutions.

Development Methodologies

Independent of development methodology Agile? Waterfall? You make

the choice. Encompasses the full

development cycle: Design Development QC Release Support

Develop Roadmap

(global deployment

)

Develop Requiremen

ts&

Architecture

Design(internation

alized)

Code(Enable,

externalize,customizabl

e)

Test(non-

English/non-ASCII)

RTM/GA(by market)

Develop Requiremen

ts(all

customers)

The Customization Approach

• Let’s do it in a separate release.• Let’s make a branch for the international

customers.• Let’s get a special team of people to work on

the international release.

How That Model Really Looks

Time

Main Line

sexy new features

bug fixes

1.0 1.0a 2.0

International Branch

Merges and Fixes

Lots more peopleand cost Internation

al Release 1.0

functionality gaps: intl users waiting for 2.0i now

Lost $ and opportunitylots of cost to get there

1.0i

The Problem with Customization

Code forks. (double, triple coding)Lag time for international releases.Non-adoption of localized release.Full regression of every language.Quality or commitment perception.Lack of data exchange between language versions.Difficult to repeat (every version is a repeat)Proliferation of bugs and of support problems. International features are cancelled.Core product still doesn’t work/can’t address similar markets.Loss of market share.

ANALYZING AND DEVELOPING A DESIGN

Large Animal Pictures

dates

numbers

images

colors

addresses

local rules

strings

Your Application

local rules, regulatory requirements, postal addresses, default bookmark lists, your company’s customer service phone numbers

The Problem

The Solution

Locale-independent global binary

Locale-dependent resources

(includes code)

Large Animal Pictures

Software ComponentOutput

Global Code

Reso

urce

s

I/O

Input

Enterprise Animal Pictures

Business Logic

Data Store

Front End

Operating Env.

Your System

API

UnicodeLegacy

Encoding

Detect / Convert

Capture Encoding

Detect / Convert

Unicode Cloud

Unicode Interface

Convert to Legacy

Partner/Content Provider

Internationalization Issues

• Text Processing– Character encodings, including Unicode, spelling, word breaks, collation,

and so on• Language

– Of the software (localization)– Of solutions built using the software (localizability, data)

• Locale-affected formats– dates, numbers and the like

• Regionally-affected formats– names, addresses, currency, and the like

• Time-related issues – time zone, calendar, holidays, work rules and the like

• Cultural adaptation– presentation, style, position, color use, and the like

• Legal requirements– accessibility, SOX, DRM, moderation, security, content, and the like

Levels of Enablement

• Not Enabled• Single-Language-at-a-Time (SLAAT)

All components run in the same language and encoding environment correctly.

• Multi-LocaleUnicode support; components run in different locales, languages, encodings, and time zones

Test Your Assumptions

Gender: Male× Female

Choose Your Language

How is this company doing?

ENABLINGMaking Code Aware of Culture

What is “enabling”?

• Enabled software:adapts the display, processing, validation, storage, and transmission of data according to the cultural, linguistic, and regional needs of the users

– Text, Characters, and Encodings– Locale Awareness– Times and Time Zones

A “global binary” is a single object-code version that is used in all markets, regardless of localization.

Don’t Code What You Think You Know

5/2/7 sometime in February? sometime in May?sometime in 2005?

1.234 more than 1000?less than 2?

4.32.MD number, time, currency?morning or afternoon?

Date Formats

Culture Format Example

U. S. A. mdy, / 2/16/05

France dmy, . 16.2.05

France dmy, - 16-2-05

CJKT ymd, / 2005/2/16

CJKT ymd, 年月日 2005 年 2 月 16日

Japan e¥md, 平成 17 年 2 月16 日

Japan ¥md, / 17/2/16

Time Formats• U.S.A.: 4:00 p.m.• France: 16.00• Japan: 1600• Japan: ごご4:00• Korea: 오후 4:32• Thai: 16:32 น.• Albanian: 4.32.MD• Arabic: م 04:32

More Examples

Assumptions about date tokens:USA: Sun, Mon, Tue 3 positions, titlecaseFrench: lun. mar. mer. four positions

lowercaseRussian: Пн Вв Ср two positions, Cyrillic USA: Jan, Feb, Mar 3 positions, titlecaseFrench: janv. févr. mars avr. variable (4 or 5)

positions, lowercaseSpanish (Spain): ene, feb, mar not titlecaseSpanish (Americas): Ene, Feb, Mar titlecase

Calendars: What Year Is It?

• Legal, ceremonial, or popular requirementGregorian 2012Japan Emperor: 24 Heisei ( 平成 24 年

)Thailand (Buddhist): 2555 (Gregorian + 543)

Chinese (traditional): 4704 (lunar)

Hebrew (lunar)תשסו 5767

Hijri (Islamic) 1428 (lunar)

Armenian 1461 ( )ԹՎ ՌՆԾԶ

etc. etc. etc.

Weekends and Holidays

• When is the weekend?– Friday is part of the weekend in some countries.

• Both official and unofficial holidays vary widely in number. Here are a few to watch for:– USA: July 4, MLK, President’s Day, Veteran’s Day, Flag Day,

Columbus Day, Thanksgiving…– Japan: Golden Week– China: New Year’s– Britain: Guy Fawke’s Day, Boxing Day– France: Bastille Day– Spain: Reyes Magos

Calendar Display

Numbers

Grouping and decimal separators:England: 12,345.67Germany: 12.345,67Switzerland: 12’345,67Swiss money: 12’345.67France: 12 345,67India: 12,34,567.89

France uses a non-breaking space!India: number of digits in groupings changes!

ListsList delimiters & separators can conflictFrench example:

2 345,67, 1 012,34, 45,67 hard to read

2 345,67 ; 1 012,34 ; 45,67 easier to read

List myNumberList = getList();NumberFormat nf = NumberFormat.getInstance();StringBuffer buf = new StringBuffer();Iterator iter = myNumberList.listIterator();while (iter.hasNext()) { buf.append(nf.format(((Number)iter.next()).doubleValue()); buf.append(“, “);}System.out.println(buf.toString());

List myNumberList = getList();NumberFormat nf = NumberFormat.getInstance();StringBuffer buf = new StringBuffer();Iterator iter = myNumberList.listIterator();while (iter.hasNext()) { buf.append(nf.format(((Number)iter.next()).doubleValue()); buf.append(“, “);}System.out.println(buf.toString());

Collation ( A F A N C Y W O R D F O R “ S O R T I N G ” )

English: ABC...RSTUVWXYZGerman:AÄB...NOÖ...SßTUÜV…YZSwedish/Finnish: AB...STUVWXYZÅÄÖNorwegian: AB...VWXYÜZÆØÅ

Organizing Information

• “Alphabet” differences• Additional information

– for example: yomi• ASCII vs. the world• Mixed information sets

“Should I be writing all of this down…”

• Wide range of variation

• Obscure formats• Difficult to obtain

reliable information on formats

• Lots of work to implement and maintain

Enabling means not having to know (m)any of the details

Supporting International Formats

• Use neutral data structures– Makes code

independent of locale– Most data types are

locale-neutral:• Boolean• String, char• Number classes• Date, Calendar

• Encapsulate formatting/validation in a function– Format style chosen

dynamically at runtime– Format details don’t

have to be specified or researched

– APIs know the gory details

Essence of Enabling

• Object to Presentation, Presentation to Object– Integers– Floats– Percents– Currencies– Dates– Times– Durations– Collation (lists)– Weights/measures/sizes– Resources (user interface strings)

Locale user presentati

on

Locale

• an identifier or data structure that allows programmers to access culturally and linguistically affected functionality in a system.

• Many systems now based on IETF BCP 47; for example JavaScript, Java 7, and CLDR

Complex Types

• Data structures, APIs, or classes built from basic types must include similar capabilities.– Store data in a locale-neutral or independent format.– Display in a language/regional/culturally sensitive manner– Convert from locale format to locale-neutral or locale-independent

storage format.

Design Time and Data Structures

• Identify your own “locale bias”– Field names matter!

• “Postal Code”, not “ZIP code”.• Family Name/Given Name, not First Name/Last Name

– Avoid problematic fields• Postal address parsing? Area code? Etc.

Currency

• Currency formatting is usually similar to number formatting. But things can vary widely here, too:– $1,100.00 [USA]– €1 100,00 [France-Euro]– ¥1,100 [Japan]– 1.100$00 Esc. [Portugal,

obsolete]– SFr. 1’000.00 [Switzerland]

• Currency associated with the locale doesn’t always apply. Store the currency type with value.– Use ISO 4217 std. codes (USD,

JPY, EUR, RUR)• Not always one symbol.• Not always two decimal places.• $100 + ¥100 = $101

• Consider neutral displays!

Being Locale Neutral

• Avoid or reduce locale-affected display to increase portability– Use unambiguous formats, such as ISO 8601-like

dates, especially in log files and the like• 2005-04-01 14:17:00 UTC

– Use consistent formats (‘user locale’), especially in columns or collections of data

Amount Currency351,234.56 USD102,556.78 EUR65,336.00 JPY

212,345.00 INR

Amount Currency351,234.56 USD102 556,78 EUR

65336 JPY2,12,345.00 INR

“The String is the Thing”

• Text doesn’t get translated on the fly.• Don’t use text as an identifier or foreign key.

– Use ID Numbers or not-human-readable values instead of requiring text fields to match.

– “Intrinsic” data value versus “display” data value.• Enumerated values displayed as strings.• Use display strings.

Displayed“Accounts Payable”

“pagável de clientes”

Enumerated

ACCOUNTS_PAYABLE

English-like Construction

• Concatenation– string1 + string2

• Pluralization– Dog + “s” = “dogs”

This topic will be covered in greater depth in the section on localization.

Databases

• Most databases can only handle one collation sequence per instance or one collation per index.– Remove reliance on alphalists.– Self-collate short lists.– Pre-collate long lists?

• Example: NLS_SORT controls the way Oracle returns data (collation sequence).– Global environment variable.– Not necessarily under your control.– Indices are built on a predetermined or binary sort.

Enabling Summary

• Understand Encodings and Unicode– All text has an encoding!

• Be Locale-Aware– Create locale-neutral data structures– Separate display from storage

IT’S ABOUT TIME

Dates, Times, Durations, Calendars a little aside…

Observed Time

Incremental Time

• Computed time based on “clock ticks” in an “epoch”– The epochal date is arbitrary. The UNIX epoch is

midnight, January 1, 1970, UTC.

Field Based Time

• Time based on calendric fields (day, month, year, hour, minute, second)

• Some systems have data types for “field based” time also.

What is a Time Zone

• A time zone is a geographical region or area that has common rules for determining the local observed time as it relates to monotonic (computer) time.

• Distinctions include:– Offset from UTC– Daylight Savings (Summer Time) behavior– Historic changes in offset or DST behavior– Political control

Durations and Repeating Events

Wall-time: this meeting is at 2 PM Pacific time every Tuesday

– interval between meetings may vary in number of seconds

• Daylight time transitions• Changes in DST rules

Fixed-duration: run the virus scanner every 57 minutes

– interval is always 342000 milliseconds

Time Zone Affected Scenarios

• Zone independent– only “incremental” times

are necessary• Local time, past only

– future changes to time zone rules not applicable

– example: logging system• Local time, both past and

future– time zone rule changes

may affect some time values

– example: calendar program

• Floating times– events not tied to a specific

time zone– example: birthdate, start date,

definition of “night” for phone usage

• Recurring events– events that recur—sometimes

during and sometimes not during daylight savings.

– example: weekly status meeting

Time Zone Scenarios

• Zone Independent — generally timestamps that don’t refer to a specific time zone.– Record local offset or (better) use UTC– May want wall time for analysis

Time Zone Scenarios

• Local Time (Past Only)—times that cannot change their relationship to DST– Store zone ID and time value

[may store offset instead of zone ID]

• Local Time (Past+Future) — time values may need to change if DST rules change– Store original offset along

with zone ID and time value– May require a database crawl

if DST rules change

Time Zone Scenarios

• Floating Times — times that don’t change regardless of where you are in the world.– Publication dates– Birth dates (or any anniversary date)– Etc.

• Handle using UTC andavoiding zone casting

Time Zone Scenarios

• Recurring Events — time values that occur in both DST and non-DST time– Store time, recurrence period, zone ID, original

offset, and whether to tie recurrence to DST

OffsetEtc/UTCEtc/GMT+1

Time Zone Identifiers

• Often based on the IANA time zone database (tzinfo) [formerly “Olson IDs”]

Continent/CityAmerica/Los_AngelesEurope/ParisAsia/TokyoAntarctica/DumontD

Urville

Ocean/Island(City)

Atlantic/CanaryPacific/AucklandPacific/Pago_Pago

Continent/Region/City

America/Indiana/ Indianapolis

Time Zone Hints

• Only 21 countries have more than one time zone (if you know the country, you often know the time zone)

• Argentina, Australia, Brazil, Canada, Chile, Democratic Republic of the Congo, Ecuador, France, Greenland, Indonesia, Kazakhstan, Kiribati, Mexico, Micronesia, Mongolia, New Zealand, Portugal, Russia, Spain, and the United States.

– Of these, most have maritime or overseas regions. Examples:

• Ecuador: Galapagos Islands• Chile: Easter Island• Portugal: Azores

Locale-Neutral Formats

• Use locale-neutral formats for interchange:– ISO 8601– Incremental time values (e.g. time_t)– Distinguish time zone if necessary for

interpretation• Offset is not the same as time zone

At any given time, in UTC, it is the same time everywhere that time is measured.

SQL data types and XML formats are often field-based, while programming languages are usually incremental.

Formatting Dates and Times

Requires more than just a locale!datetime zonecalendar

value being formatted

defines relation to “wall time”

defines rules for calculating field

values

1034197545321L

Asia/Tokyo

Japanese Imperial

October 10, 14H 6:05:45 AM JST

Externalization

Making software localizable

What is localization?

“What is localization?” Zula asked. Peter sighed, letting her know it was a stupid question. “Translating foreign software into Hungarian, making things work correctly in the special environment of Hungary,” Csongor explained, and Zula thought that she could glimpse, here, in the way that he contentedly explained things, Csongor’s father the school-teacher.

Reamde by Neil Stephenson

What is Localization?

• The process of tailoring a product to a specific target market.– Translation of messages– Adaptation to local preferences– Addition (or subtraction) of content or features

Localization is Obvious

… but it isn’t “internationalization”• Localizability is internationalization.

– Externalize text– Externalize presentation– Dynamic composition– Distribution of language content– “Plug-in” features

What is a ‘Resource’?

any application component loaded dynamically at runtime, rather than compiled into the application

In localization: source code files containing language, region, or culturally-affected materials

– Text– Error messages – Icons– Pictures– Fonts– Colors– Graphics– Sizes– Positions– Magic Numbers– Mnemonics (“Alt+G”,

“F4”, etc.)– File Locations– Dictionaries– Glossaries– Grammar Rules– Code

Why Resources?

TextError messages IconsPicturesFontsColorsGraphicsSizesPositionsMagic NumbersMnemonicsDictionariesGlossariesGrammar RulesCulturally specific code

Before After

Avoiding Forks

Global Binary

Resources

English Version

ResourcesResources

ResourcesLanguage +1 Version

Forked Code Woes

• Hard to fix and maintain• Different versions in the field• Delays in releasing localized product• Different functionality by region• Confusing for customers/users• Versions are not interoperable and might not

be able to exchange data!

More Benefits

• Rename or re-brand product• Fix spelling or grammar mistakes• Fix usability• Make terminology consistent• Test drive new customer experiences, try new

designs, etc.

… all without a rebuild!

"Project-Id-Version: blanket 1.0\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2011-03-23 15:43-0700\n" "PO-Revision-Date: 2011-03-23 15:43-0700\n" "Last-Translator: Richard Gillam <gillam (a] lab126.com>\n" "Language-Team: en <kindle-i18n-team (a] lab126.com>\n" "MIME-Version: 1.0\n" 20 "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" # font msgid “my.font.name" msgstr "dialog" #: progress bar in point based msgid "progress_bar.rect" msgstr "43.11,64.67,172.45,12.93"

msgid "progress_bar.border" msgstr "2"

# bounding box: x_pos,y_pos,width,height msgid "shutdown.cust_service.header.rect" msgstr "0,14.65,258.68,12.07"

msgid "shutdown.cust_service.header" msgstr "Repair Needed"

What’s wrong here?

String1 = There areString2 = noString 3 = tables inString4 = filesString5 = .

Messages I Could Build:There are files.There are no files. There are 50 files. There are tables in files. There are no tables in files.There are 50 tables in no files.There are tables in.

Let’s Google Translate That!Messages I Could Build:

There are files.There are no files. There are 50 files. There are tables in files. There are no tables in files.There are 50 tables in no files.There are tables in.

Il ya des fichiers.Il n'y a pas les fichiers.Il ya 50 fichiers.Il ya des tables dans des fichiers.Il n'ya pas de tables dans des fichiers.Il ya 50 tableaux dans aucun fichier.Il ya des tables po.

Don’t Build Text From Fragments

• Text fragments are hard to translate– Fragments may not follow grammar rules– Cannot know which parts go together– Parts can be reused in incompatible ways

• Internationalization APIs offer “patterns” to fix:

[] files out of [] were deleted.

An error occurred at [] on [].

Page [] of []Processing: []%

complete.

Example: MessageFormat (Java)

There were {0} tables on {1}.

There were {0,number,integer} tables on {1,date,short}.

{1,date} に {0,number,integer} のテーブルがあった。

• Number replacement variables.• Provide typing and formatting information where possible.• Externalize as a single unitary string.

What’s My Gender

“Documenti del Chris“"Documenti della Chris”

"Documenti - Chris"

More Issues With Text Composition

– There were one errors found.– You have earned your 22th set of bonus points.

Sentence Parts Must Agree

• Endings, Gender, Plurality, Case– e.g. Japanese counting uses different words for

different kinds of objects– e.g. Slavic languages use different endings for

singular, few, many…

Complex Message Formatting

There were no errors.There was 1 error.There were 2 errors.

“choice format” APIs allow for different resources to be used based on runtime values.

0:There were no errors.1:There was {0} error.2:There were {0} errors.

0:не было ошибок 1:была {0} ошибка 2:были {0} ошибки 5:были {0} ошибок

The number of resources may need to vary by locale or language

Examples: ordinal numbers (1st,

2nd, 3rd, 4th, etc.) complex messages,

such as “27 seconds ago” vs. “10 minutes ago”

Images and Icons• Avoid metaphors• Avoid cultural sensitivities• Avoid body parts• Replace as necessary

• Avoid putting text into graphics

Graphic: $20Text: $0.06

Images and Culture

• Beware your biases—even “good” ones.

Meet your friends on our new social website

for India

Isn’t it Swell?

English is very succinct.– Words in other languages

are longer– Sentences are longer– Characters may be larger

More Swollen Text

• 30% in length (alphabetics, abjads, etc.)• 30% in height (ideographics)• But… a rule of thumb, not a “fact”

– Measure your results with care.

A Cautionary Tale

GUI Layout

Dereferencing

• Minimize sentence building• Minimize arguments per string• Use subject:predicate wherever possible

When you can do this:Balance: $100.00

Don’t do this:Your balance is $100.00.

Dynamic vs. Static Layout

• Magic numbers• Externalized layouts• Mnemonics• Colors

Localizing Styles

• Bolding is not universal for emphasis– Italicization, Capitalization, etc. are also not

universal (some scripts don’t have these attributes)

• Use Logical not Presentational names– Describe the function not the appearance. For

example, use “emphasis” instead of “italics”.

中国 Amikake Wakiten

Use of Color

“Going Down”

“Going Up”

Non-Translatable Resources

• Some content should be externalized but not translated– Sometimes referred to as “DNT” for “do not translate”

• Externalize? Yes…– Segregate DNT material from translated material if possible (by

using separate resource files or separate resource blocks within a file).

– Developers can’t always tell when something should or should not be DNT… and neither can translators (context is missing)

The “Locale” in “Localization”

• Resources “fall back” to find the best match Global Binary

Resources

zh-Hans-SG (Chinese, Simplified script, Singapore)

zh-Hans (Chinese, Simplified script)

zh (Chinese)

(root)

Fallin

g b

ack

Sparse Population

• A given language resource may not contain a complete set of resources.– Some resource language fall back for each sub-

resource (such as a particular value)

“appName” “Demo”“maxRows” 57“dialogTitle” “Hello World”

“appName” “Démo”

“dialogTitle” “Bonjour monde”

Getting the Right Locale

Business Logic

Data Store

Front End

Operating Env.

client

Client Locale

Server Locale

API Request Locale

System Mgmt Locale

One request might serve multiple purposes or be seen in multiple contexts

Resources and Translation

“key”, “display string”“dialogTitle”, “Dialog Title”“aMessage”, “This is a message.”

Pseudo-Translation

“key”, “ðìsplàÿ stríñg”“dialogTitle”, “Ðîálòg Tïtl蔓aMessage”, “Thìß ís â M

ésßãgê.

Pseudotranslation

Keyboards

Input Method EditorsSome languages require software to assemble keystrokes into characters

Asian languages with vary large character sets Complex scripts with vowel-killers and other

contextual editing requirements

Applications that interact directly with key-pressed events can disable or disrupt IME input.

On- and over-the-spot editing

Customization

When is it okay?

• Content should be highly localized or have locale-specific requirements: – customization lets you address

this requirement in the most localized possible manner

dates

numbers

images

colors

addresses

local rules

etc.

Externalization again

Your Application

local rules, regulatory requirements, postal addresses, default bookmark lists, your company’s customer service phone numbers

Externalization again

Locale-independent global binary

Locale-dependent resources

(includes code)

Large Animal Pictures

Software ComponentOutput

Global Code

Reso

urce

s

I/O

Input

Code can be a resource!

Customization Examples

Postal address validation

Postal code validationTelephone number

formatter“Personality” questions

blood type vs. sun sign

Personal name formatterfirst/last position, space,

highlighting, formality, etc.Tax codes and shipping

schedules

Generic API

Generic Implementation

USImplementation

DE Implementation

?? Implementation

Example: Postal Addresses

address1 varchar(32)

address2 varchar(32)

city varchar(16)

state char(2)

zip char(5)

country char(2)

address1 varchar(64)

address2 varchar(64)

city varchar(64)

province varchar(64)

postcode varchar(64)

i18n

country=US, postcode=‘WC2 1GH’ // error

country=UK, postcode=‘95111’ // error

country=DE, postcode=‘1A4 喪’ // okay?

public interface Address {

public class USAddress extends genericAddress {

public class UKAddress extends genericAddress {

public class genericAddress implements Address {

Building Global Software

Beyond Just Coding: Localization, QA, and all that

The Internationalization Cycle

• Encompasses the full development cycle:– Requirements– Design– Development– QC– Release– Support

Develop Roadmap

(where is the product going?)

Develop Requirement

s&

Architecture

Design(internationa

lized)

Code(Enable,

externalize,modularize)

Test(non-

English/non-ASCII)

RTM/GA(by market)

Support Issues

and Requests(all

customers)

What is “internationalization QA”?

• Does the enabled product work correctly?– Non-English configurations– Non-ASCII data and encoding support– Cross time zone support– Market specific features or customizations

• Does localization appear correctly?– Is the product localizable?

What makes this different from “regular” QA?

Growing (and Pruning) the Matrix

Include non-English configurations in your test matrix; include non-ASCII data in your tests.

Be prepared to prune the test matrix.

What to Test With

– Test Non-English configurations• Non-English locales (lying to your machine)• Native configurations (when does it make sense?)

– Test Non-ASCII data• Encodings, encodings, everywhere• Non-ASCII character values

– Test Across Time Zones• Two or more time zones; consider international date

line (“it’s tomorrow in Japan”) and DST issues

Planning Testing

Initially• Get tools that are

enabled!– Automation allows

greater coverage, but only if it works.

• Plan encodings and locales as part of the test matrix.

• Acquire third-party products as necessary.

Increasing Maturity

• Use test driven development practices.

• Get developers to write unit tests that are internationalized.

• Put the ‘i18n’ bugs into the regression suite.

Configuring Machines

Create both native and simulated environments:– Native operating systems may have minor but

sometimes critical differences (folder names, keywords, localized registry entries)

– Most features don’t run into native differences (easier to work with English-localized machines)

– Don’t buy physical keyboards (use software keyboards) unless your application relies on scan codes from keys

Localization

Incorporate

Localization is part of the release process too.– Changes to the user interface cost the localization

team time and money.– (Changes to the product cost the documentation

and QA folks too)• May need to institute change control or a UI

freeze

Simultaneous Shipment (Simship)

Ideally, to maximize opportunity, ship the target languages the same day as the source language.

– It might not make sense for your product.– But it might not be as difficult as you think it is. It

might even be good for you.

Distribution of Content

• How does the localized text get into the running product?– Satellite assemblies, DLLs, shared libraries– Message catalogs– Special directory– Database– Etc.

More Distribution

• “Specific Language” (per-language)

• “Language Included” (one or more languages)

• “Language Pack” (product plus something)

English

German

French

English

German

French

English

German

French

Global Binary+

Completing the Product

• Static content is often under source control and can be localized “normally”

• Dynamic content may include the initial set of data or other items which need to be localized beyond software.– Demos and Demo Data– Dictionary, Language add-ons– Local offers, links to Web store, etc.– Packaging– Regulatory

Quality Checking and Development Methodologies

• Translation is a human-oriented task. – Translation time lines are linear

with volume.• Localized product should be

tested for functionality– translation can break things– usually the first language finds

most of the bugs• Translations should be checked

for quality

• Development cycle has to include time for translators and quality assurance to catch up.– This does not mean “no agile”

or “no changes”– Do pilot language(s) or moving-

target translation; do better UI design and usability reviews; etc.

Summary

Internationalization

… is a fundamental architectural approach: it is how software is built.– Design– Enabling– Externalization– Customization– Testing and Support– Lifecycle

Q&A

Would you write the code for I18N on the whiteboard before you go?

#define UNICODE#import I18N.h