Free Pascal compiler internationalisation Rimgaudas Laucius Institute of Mathematics and...

Post on 15-Dec-2015

226 views 4 download

Tags:

transcript

Free Pascal compiler internationalisation

Rimgaudas LauciusInstitute of Mathematics and Informatics,

Vilnius University

Lithuania

Introduction

• Institute of Mathematics and Informatics, Informatics Methodology Department• Software localisation• Teaching of informatics and programming• E-learning and standards• Informatics terminology

• Vilnius University• Localisation course

Localisation in Lithuania

• One of the four priorities emphasised in the strategic project for the development of the information society in Lithuania is:

“to uphold the inheritance of Lithuanian language and culture implementing the information technologies and telecommunications”

Open Source in Lithuania

• Research which was carried out in 2004, “Open Source in Education” revealed that open source software integration into education has a large positive economical and also pedagogical effect

• Education requires high quality and fully localised software

• Open source software is more flexible in terms of localisation

Free Pascal compiler

• Excellent, open source compiler• Works under all widely used operating

systems Windows, Linux and others• Widely used. Has been used in International,

Baltic and national Lithuanian Olympiads in informatics for a few years already

• Replacement for obsolete Turbo Pascal system in Lithuanian schools

FPS

Compilers’ internationalisation

• Internationalisation is part of the software development process, so the internationalisation of development tools is very important

• Most contemporary software development tools are not internationalised enough

• Though this research is done on Free Pascal compiler, most of represented issues are common to most of compilers

Programming language standards

• Internationalisation relates with programming language standards

• Pascal programming language standards• Standards of other languages

Examples of internationalised compilers

• There are not many of these examples• One of the most well known internationalised

programming system is LOGO• Vector Pascal

Structure of Free Pascal

• Free Pascal is system made up of the compiler program itself and run-time library (RTL)

• Compiler and RTL interaction:

• Sometimes to change compiler one needs to change the RTL

Support of multilingual source code

• This is the first stage of compiler internationalisation

• There are many scripts which require more than the 8-bit character set

UTF-8 implementation

• Unicode ~ UTF-8• Some utilities used by compilers do not

support pure Unicode (Unicode chars may be treated as pairs of 8-bit chars; example U+0900 ~ 09 00, (tab and end of string))

• Allows step by step implementation of lexical extensions

Lexical extensions

• Strings• Identifiers• Directives• Reserved words• Operators• Numbers

Strings

• WideString implementation issues– Compatibility with other systems

– Ambiguity

– Conversions between Unicode and other character sets

Ambiguity example

procedure go(const s: WideString); begin ... end;

procedure go(const s: String); begin ... end;

beginGo('Hi');

end.

Which overloaded procedures have to be called?

Unicode support layer

• Unicode support layer wraps OS APIs’ in an OS independent way.

• Under Win9x implements Microsoft Layer for Unicode (MSLU)

Identifiers

• Identifiers have to reflect clear meaning of object, be easy to comprehend and memorize. Best way to support these features is to allow use of identifiers written in vernacular language

• Unicode Standard Annex #31: Identifier and Pattern Syntax

Directives

• Names• Parameters

– Logical (ON, OFF)

– Strings ({$warning Possible malfunctioning})

– File names ({$includepath ..\inc})

Reserved words

• Unification myth– Compared 13 similar programming languages

(Algol, Pascal, Modula, Ada, C, Java,…)

– Only ~3% of reserved words are same

– 56% met only in particular language

• Possible unambiguous translation

Example of localised reserved words

Operators

• Unicode has all mathematical symbols needed to express mathematical operations

• Example:

Numbers

• There are various scripts to express decimal numbers.

• Example:

Decimal separator

• JAV, GB ‘.’• Most European countries ‘,’

• Localisation of delimiter may cause ambiguity.

Solution needs to extend syntax of numbers.

25,88 – real number

25, 88 – two numbers

Punctuation

• Spaces: general U+0020, nonbreaking U+00A0, ideographic U+3000, etc

• Quotes: “English”, "Lithuanian“,

• Etc

Bi-directional text

• Bi-directional text is an issue of text representation, not the compiler

Unicode file names support

• Handling of files requires OS API, so it have to be done via RTL’s Unicode support layer

• Compilers have to use MSLU under Win9x

Input/Output

• File input/output requires additional support for Unicode encoding

• Windows console does not support Unicode– It can be replaced but is it the best solution?

Localisation framework

• Strings and other resources have to be externalised for easy localisation

• Localisation kits have to be prepared

• Questions?• Thank you

• Contact E-mail: