Free Pascal compiler internationalisation
Rimgaudas LauciusInstitute of Mathematics and Informatics,
Vilnius University
Lithuania
Introduction
• Institute of Mathematics and Informatics, Informatics Methodology Department• Software localisation• Teaching of informatics and programming• E-learning and standards• Informatics terminology
• Vilnius University• Localisation course
Localisation in Lithuania
• One of the four priorities emphasised in the strategic project for the development of the information society in Lithuania is:
“to uphold the inheritance of Lithuanian language and culture implementing the information technologies and telecommunications”
Open Source in Lithuania
• Research which was carried out in 2004, “Open Source in Education” revealed that open source software integration into education has a large positive economical and also pedagogical effect
• Education requires high quality and fully localised software
• Open source software is more flexible in terms of localisation
Free Pascal compiler
• Excellent, open source compiler• Works under all widely used operating
systems Windows, Linux and others• Widely used. Has been used in International,
Baltic and national Lithuanian Olympiads in informatics for a few years already
• Replacement for obsolete Turbo Pascal system in Lithuanian schools
FPS
Compilers’ internationalisation
• Internationalisation is part of the software development process, so the internationalisation of development tools is very important
• Most contemporary software development tools are not internationalised enough
• Though this research is done on Free Pascal compiler, most of represented issues are common to most of compilers
Programming language standards
• Internationalisation relates with programming language standards
• Pascal programming language standards• Standards of other languages
Examples of internationalised compilers
• There are not many of these examples• One of the most well known internationalised
programming system is LOGO• Vector Pascal
Structure of Free Pascal
• Free Pascal is system made up of the compiler program itself and run-time library (RTL)
• Compiler and RTL interaction:
• Sometimes to change compiler one needs to change the RTL
Support of multilingual source code
• This is the first stage of compiler internationalisation
• There are many scripts which require more than the 8-bit character set
UTF-8 implementation
• Unicode ~ UTF-8• Some utilities used by compilers do not
support pure Unicode (Unicode chars may be treated as pairs of 8-bit chars; example U+0900 ~ 09 00, (tab and end of string))
• Allows step by step implementation of lexical extensions
Lexical extensions
• Strings• Identifiers• Directives• Reserved words• Operators• Numbers
Strings
• WideString implementation issues– Compatibility with other systems
– Ambiguity
– Conversions between Unicode and other character sets
Ambiguity example
procedure go(const s: WideString); begin ... end;
procedure go(const s: String); begin ... end;
beginGo('Hi');
end.
Which overloaded procedures have to be called?
Unicode support layer
• Unicode support layer wraps OS APIs’ in an OS independent way.
• Under Win9x implements Microsoft Layer for Unicode (MSLU)
Identifiers
• Identifiers have to reflect clear meaning of object, be easy to comprehend and memorize. Best way to support these features is to allow use of identifiers written in vernacular language
• Unicode Standard Annex #31: Identifier and Pattern Syntax
Directives
• Names• Parameters
– Logical (ON, OFF)
– Strings ({$warning Possible malfunctioning})
– File names ({$includepath ..\inc})
Reserved words
• Unification myth– Compared 13 similar programming languages
(Algol, Pascal, Modula, Ada, C, Java,…)
– Only ~3% of reserved words are same
– 56% met only in particular language
• Possible unambiguous translation
Example of localised reserved words
Operators
• Unicode has all mathematical symbols needed to express mathematical operations
• Example:
Numbers
• There are various scripts to express decimal numbers.
• Example:
Decimal separator
• JAV, GB ‘.’• Most European countries ‘,’
• Localisation of delimiter may cause ambiguity.
Solution needs to extend syntax of numbers.
25,88 – real number
25, 88 – two numbers
Punctuation
• Spaces: general U+0020, nonbreaking U+00A0, ideographic U+3000, etc
• Quotes: “English”, "Lithuanian“,
• Etc
Bi-directional text
• Bi-directional text is an issue of text representation, not the compiler
Unicode file names support
• Handling of files requires OS API, so it have to be done via RTL’s Unicode support layer
• Compilers have to use MSLU under Win9x
Input/Output
• File input/output requires additional support for Unicode encoding
• Windows console does not support Unicode– It can be replaced but is it the best solution?
Localisation framework
• Strings and other resources have to be externalised for easy localisation
• Localisation kits have to be prepared
• Questions?• Thank you
• Contact E-mail: