+ All Categories
Home > Documents > 8. Unicode Standardization - tdil.meity.gov.in file0ca5 ¢Ú kannada letter tha 0ca6 ¥Ú kannada...

8. Unicode Standardization - tdil.meity.gov.in file0ca5 ¢Ú kannada letter tha 0ca6 ¥Ú kannada...

Date post: 26-Apr-2019
Category:
Upload: trinhnhi
View: 232 times
Download: 0 times
Share this document with a friend
13
8.1 Kannada Code Chart Contents October 2002 28 8. Unicode Standardization
Transcript

8.1 Kannada Code Chart

Contents October 200228

8. Unicode Standardization

8.1.1 Kannada Code Chart Details

Code Character Description

Point

Various signs

0C82 #M KANNADA SIGN ANUSVARA

0C83 #N KANNADA SIGN VISARGA

Independent vowels

0C85 @ KANNADA LETTER A

0C86 A KANNADA LETTER AA

0C87 B KANNADA LETTER I

0C88 C KANNADA LETTER II

0C89 D KANNADA LETTER U

0C8A E KANNADA LETTER UU

0C8B Fß KANNADA LETTER

VOCALIC R

0C8C � KANNADA LETTER

VOCALIC L

• Not in present use

0C8D <reserved>

0C8E G KANNADA LETTER E

0C8F H KANNADA LETTER EE

0C90 I KANNADA LETTER AI

0C91 <reserved>

0C92 J KANNADA LETTER O

0C93 K KANNADA LETTER OO

0C94 L KANNADA LETTER AU

Consonants

0C95 OÚ KANNADA LETTER KA

0C96 R KANNADA LETTER KHA

0C97 VÚ KANNADA LETTER GA

0C98 YÚ KANNADA LETTER GHA

0C99 \ KANNADA LETTER NGA

0C9A Ú̂ KANNADA LETTER CA

0C9B aÚ KANNADA LETTER CHA

0C9C d KANNADA LETTER JA

0C9D ÁÚhß KANNADA LETTER JHA

0C9E j KANNADA LETTER NYA

0C9F l KANNADA LETTER TTA

0CA0 pÚ KANNADA LETTER TTHA

0CA1 sÚ KANNADA LETTER DDA

0CA2 vÚ KANNADA LETTER DDHA

0CA3 y KANNADA LETTER NNA

0CA4 }Ú KANNADA LETTER TA

0CA5 ¢Ú KANNADA LETTER THA

0CA6 ¥Ú KANNADA LETTER DA

0CA7 Ú̈ KANNADA LETTER DHA

0CA8 «Ú KANNADA LETTER NA

0CA9 <reserved>

0CAA ®Ú KANNADA LETTER PA

0CAB ±Ú KANNADA LETTER PHA

0CAC ¶ KANNADA LETTER BA

0CAD ºÚ KANNADA LETTER BHA

0CAE ÈÚß KANNADA LETTER MA

0CAF ¾Úß KANNADA LETTER YA

0CB0 ÁÚ KANNADA LETTER RA

0CB1 � KANNADA LETTER RRA

0CB2 Ä KANNADA LETTER LA

0CB3 ×Ú KANNADA LETTER LLA

0CB4 <reserved>

0CB5 ÈÚ KANNADA LETTER VA

0CB6 ËÚ KANNADA LETTER SHA

0CB7 ÎÚ KANNADA LETTER SSA

0CB8 ÑÚ KANNADA LETTER SA

0CB9 ÔÚ KANNADA LETTER HA

0CBA # KANNADA INVISIBLE

LETTER

0CBB #Ú KANNADA VOWEL SIGN A

0CBC #Ã Ã KANNADA SIGN NUKTA

0CBD % KANANADA SIGN AVAGRAHA

October 2002 Contents 29

Dependent vowel signs

0CBE #Û KANNADA VOWEL SIGN AA

0CBF #Ý KANNADA VOWEL SIGN I

0CC0 #ÝÞ KANNADA VOWEL SIGN II

0CC1 #ß KANNADA VOWEL SIGN U

0CC2 #à KANNADA VOWEL SIGN UU

0CC3 #ä KANNADA VOWEL

SIGN VOCALIC R

0CC4 � KANNADA VOWEL

SIGN VOCALIC RR

0CC5 <reserved>

0CC6 #æ KANNADA VOWEL SIGN E

0CC7 #æÞ KANNADA VOWEL SIGN EE

0CC8 #æç KANNADA VOWEL SIGN AI

0CC9 <reserved>

0CCA #æà KANNADA VOWEL SIGN O

0CCB #æàÞ KANNADA VOWEL SIGN OO

0CCC #è KANNADA VOWEL SIGN AU

Various signs

0CCD #é KANNADA SIGN HALANT

0CCE <reserved>

0CCF <reserved>

0CD0 <reserved>

0CD1 � KANNADA DIACRITIC

SIGN UDATTA

•Used above any character

0CD2 � KANNADA DIACRITIC

SIGN ANUDATTA

•Used below any character

0CD3 � KANNADA DIACRITIC

SIGN GURU-GRAVE

•Used above any character

0CD4 � KANNADA DIACRITIC

SIGN LAGHU-ACUTE

•Used above any character

0CD5 � KANNADA LENGTH MARK

0CD6 � KANNADA AI LENGTH

MARK

Additional consonants

0CDE � KANNADA LETTER LLLA

Generic additions

0CE0 Fà KANNADA LETTER

VOCALIC RR

0CE1 � KANNADA LETTER

VOCALIC LL

•Not in present use

Digits

0CE6 0 KANNADA DIGIT ZERO

0CE7 1 KANNADA DIGIT ONE

0CE8 2 KANNADA DIGIT TWO

0CE9 3 KANNADA DIGIT THREE

0CEA 4 KANNADA DIGIT FOUR

0CEB 5 KANNADA DIGIT FIVE

0CEC 6 KANNADA DIGIT SIX

0CED 7 KANNADA DIGIT SEVEN

0CEE 8 KANNADA DIGIT EIGHT

0CEF 9 KANNADA DIGIT NINE

0CF5 % KANNADA SIGN REPH

0CF9 � KANNADA DIACRITIC

SIGN DEERGHA SWARITHA

•Used above any character

Contents October 200230

8.1.2 Kannada General Information& Description

Introduction

The Kannada script is a South Indian script. It isused to write Kannada language of Karnataka Statein India. This is also used in many parts of TamilNadu, Kerala, Andhra Pradesh and Maharashtra.In addition, the Kannada script is also used to writeTulu, Konkani and Kodava languages. Kannada alongwith other Indian language scripts shares a largenumber of structural features. The Kannada block ofUnicode Standard (0C80 to 0CFF) is based on ISCII-1988 (Indian Standard Code for InformationInterchange). The Unicode Standard (version 3)encodes Kannada characters in the same relativepositions as those coded in the ISCII-1988 standard.

The Writing system that employs Kannada scriptconstitutes a cross between syllabic writing systemsand phonemic writing systems (alphabets). Theeffective unit of writing Kannada is the orthographicsyllable consisting of a consonant and vowel (CV)core and optionally, one or more precedingconsonants, with a canonical structure of ((C)C)CV.The orthographic syllable need not correspondexactly with a phonological syllable, especially whena consonant cluster is involved, but the writingsystem is built on phonological principles and tendsto correspond quite closely to pronunciation.

The orthographic syllable is built up of alphabeticpieces, the actual letters of Kannada script. Theseconsist of distinct character types: Consonantletters, independent vowels and the correspondingdependent vowel signs. In a text sequence, thesecharacters are stored in logical phonetic order.

Rendering Kannada Characters

Kannada characters can combine or change shapedepending on their context. A character’sappearance is affected by its ordering with respectto other characters and the application or systemenvironment. This variation can cause theappearance of Kannada characters to be differentfrom nominal glyphs.

Vowels (Swaras)

Independent vowel lettersThe independent vowels (Swaras) in Kannada areletters that stand on their own. The writing systemtreats independent vowels as orthographic CVsyllables in which the consonant is null. The

independent vowel letters are used to write syllables,which start with a vowel. The Unicode characterencoding for Kannada uses a distinct set of namingconventions for some mid vowels of the fourteenvowels in Kannada. Of these fourteen vowels,twelve vowels have been divided into six sets, eachset consisting of a short vowel (Hrasva Swara),followed by a corresponding long vowel (DeergaSwara). These are two types of Swaras are dependingon the time used to pronounce them.

Hrasva Swara is a freely existing independent vowelwhich can be pronounced in a single matra time(matra kala) whereas a Deergha Swara is the vowelwhich can be pronounced in two matra.time. Thesix sets of the swaras are :

C, D (0C85 , 0C86)E, F (0C87 , 0C88)G, H (0C89 , 0C8A)IÄ, IÆ (0C8B , 0CE0)J, K (0C8E , 0C8F)M, N (0C92 , 0C93)

Of these, the vowel IÆ(0CE0) is not in present use.

The two Deergha swaras L(0C90) and O(0C94)have no Hrasva swara counterparts.

Further, the so-called swaras with code values 0C8Cand 0CE1 are not used in Kannada and are notrequired for Kannada.

Dependent vowel signs (Matras)

The dependent vowel signs serve as the commonmanner of writing non-inherent vowels and aregenerally referred to as Swara Chinhas in Kannadaor Matras in Samskrit. The dependent vowel signsdo not appear stand-alone; rather, they are visiblydepicted in combination with a base-letter form(generally a consonant). A single consonant or aconsonant cluster may have a dependent vowel signapplied to it to indicate the vowel quality of thesyllable, when it is different from the inherentvowel. Explicit appearance of a dependent vowelsign in a syllable overrides the inherent vowelC(0C85) of a single consonant letter.

There are several variations with which thedependent vowels are applied to the baseletterforms. Most of them appear as non-spacingdependent vowel signs when applied to baseletterforms; above or to the right side of a consonantletter or a consonant cluster. The following are theexceptions and variations for the above rule:

October 2002 Contents 31

A. The two dependent vowel signs È(0CC3) &ñ(0CC4) appear one level below and to theright of the consonant or the consonant cluster,separated by a small white space.

B. Each of the five dependent vowels signs ÂÃ(U+0CC0), ÉÃ(0CC7), ÉÊ(0CC8), ÉÆ(0CCA) &ÉÆÃ (0CCB) are depicted by two or three glyphcomponents (two part or three part vowel signs)with one component appearing with a spaceto the right of the consonant or the consonantcluster.

i) In the case of three of the above-mentionedtwo/three-part dependent vowels Âà (0CC0), ÉÃ(0CC7) and ÉÆÃ(0CCB), the non-spacingcomponent(s) of each of them is(are) the sameas the vowel sign(s) of the correspondingpreceding short vowels. The spacingcomponent for each of these dependent vowelsis the same length mark à (0CD5) given inUnicode version 3. The logic for this is thatthese dependent vowels are nothing but the longforms (independent and phonetically distinct)of the preceding short vowels.

ii) The first component of the dependent vowelÉÊ (0CC8) mentioned above is the same as thedependent vowel É (0CC6) and the secondcomponent is same as Ê (0CD6). These aredefined independently in Unicode version 3.The second part appears slightly below and tothe right of the consonant or the consonantclusters.

C. In view of this, it is important to note that thetwo glyphs (the length mark à and the secondcomponent of ÉÊ i.e. Ê) represented with thecodes at 0CD5 and 0CD6 in Unicode version3 have no independent existence and do notplay any part as independent codes in thecollation algorithm.

D. Unlike Devanagari, the Kannada script does nothave any character with a left-side dependentvowel sign.

E. A one-to-one correspondence exists betweenindependent vowels and dependent vowel signs.

Consonant letters (Vyanjanas)

Each of the 36 consonant letters in Kannada(enumerated with codes 0C95 to 0CB9 and 0CDE)represents a single consonantal sound but also has

the peculiarity of having an inherent vowel,generally the short vowel C (/a/ 0C85).

Thus the Kannada letter at 0C95 represents notjust Pï (K) but PÀ (KA) with the inherent vowelC(0C85). In the presence of a dependent vowel,however, this inherent vowel associated with aconsonant letter is overridden by the dependentvowel. The consonants ¾ (0CB1) and ¿(0CDE)sound similar to g À (0CB0) and ¼ À (0CB3)respectively. These two appear in ancient Kannadatexts but are not in present use. With this,consonants in modern Kannada are 34 in number(without ¾ and ¿). These are classified as VargeeyaVyanjanas (0C95 to 0CAE) and Avargeeya Vyanjanas(0CAF, 0CB0 and 0CB2 to 0CB9).

Vargeeya Vyanjanas : The five sets of VargeeyaVyanjanas arePÀ R UÀ WÀ Y0C95 0C96 0C97 0C98 0C99

ZÀ bÀ d gÀhÄ k0C9A 0C9B 0C9C 0C9D 0C9El oÀ qÀ qsÀ t0C9F 0CA0 0CA1 0CA2 0CA3vÀ xÀ zÀ zsÀ £À0CA4 0CA5 0CA6 0CA7 0CA8

¥À ¥sÀ § ¨sÀ ªÀÄ0CAA 0CAB 0CAC 0CAD 0CAE

Avargeeya Vyanjanas : The nine AvargeeyaVyanjanas (enumerated in the acceptable sortingorder) are:AiÀÄ gÀ ® ªÀ ±À µÀ ¸À ºÀ ¼À0CAF 0CB0 0CB2 0CB5 0CB6 0CB7 0CB8 0CB9 0CB3

Halant

Like Devanagari, Kannada script also employs asign known as halant or vowel omission sign. Ahalant sign ( ¬ , 0CCD) nominally serves to cancel(or kill) the inherent vowel of the consonant towhich it is applied.

The halant functions as a combining character.When a consonant has lost its inherent vowel bythe application of halant, it is known as a deadconsonant. The dead consonants are thepresentation forms used to depict the consonantswithout an inherent vowel. Their rendered formsin Kannada resemble the full consonant with thevertical stem replaced by the halant sign, whichmarks a character core. The stem glyph ( ® at 0CBB)

Contents October 200232

is graphically and historically related to the signdenoting the inherent /a/ (A) vowel (0C85). Incontrast, a live consonant is a consonant that retainsits inherent vowel or is written with an explicitdependent vowel sign. The dead consonant is definedas a sequence consisting of a consonant letter followedby a halant. The default rendering for a deadconsonant is to position the halant as a combiningmark bound to the consonant letter form.

Avagraha (³)

A spacing mark ³, called avagraha sign is used whilerendering Samskrit tests. This is located at OCBD.

Encoding order

The traditional Kannada alphabetic encoding orderfor consonants follows articulatory phoneticprinciples, starting with velar consonants andmoving forward to bilabial consonants, followedby liquids and then fricatives. ISCII (Indian ScriptCode for Information Interchange) & the Unicodestandard both observe this traditional order.

Consonant conjuncts (Samyuktaksharas)

Like any other Indian script, Kannada is also notedfor a large number of consonant conjunct formsthat serve as orthographic abbreviations (ligatures)of two or more adjacent forms. This abbreviationtakes place only in the context of a consonantcluster. An orthographic consonant cluster isdefined as a sequence of characters that representone or more dead consonants (denoted by C

d)

followed by a normal live consonant (denotedby C

l).

Corresponding to each Kannada consonant, thereexists a separate and unique glyph, which is speciallyused to represent the corresponding consonant ina consonant cluster. Most of these conjunctconsonant glyphs resemble their original consonantforms (many without the implicit vowel sign,wherever applicable).

In Kannada, there is only one type of conjunctformation (consonant cluster) and it is depicted asfollows:

4The first consonant of the consonant cluster isrendered with the implicit or a differentdependant vowel appearing as the terminalelement of the consonant cluster.

4The remaining consonants (consonants inbetween the first consonant and the terminalvowel element) appear in conjunct consonantglyph forms in the phonetic order. They aregenerally depicted directly below or sometimesbelow but to the right of the first consonant.

Thus, the systematically designed Kannada scriptfont contains the conjunct glyph components, butthey are not encoded as Unicode characters, becausethey are the resultant of ligation of distinct letters.Kannada script rendering software must be able tomap appropriate combinations of characters incontext to the appropriate conjunct glyphs in fonts.

Invisible consonant INV

There is a need to have a consonant, which providesan invisible base for the display of dependant vowelswithout any consonant base. This can be theUnicode Standard Zero Width Non-Joiner at200C. This can also be used to provide propercollation of the words containing dead consonants.

Explicit Halant

Normally, a halant character serves to create deadconsonants, which, in turn, combine withsubsequent consonants in order to form conjuncts.This behavior usually results in a halant sign notbeing depicted visually. Occasionally, however, thisdefault behavior is not desired when a deadconsonant should be excluded from conjunctformation, in which case the halant sign is visiblyrendered.

In order to accomplish this, the Unicode Standardcharacter 200C (Zero Width Non-Joiner) isintroduced immediately after the encoded deadconsonant that is to be excluded from conjunctformation.

For example, the use of Zero Width Non-Joinerprevents the default formation of the conjunct form�®Ì, resulting in �¬S®.The Kannada script adopts the convention ofdepicting the character (in this case the halant sign)as appropriate for the consonant to which it isattached.

In summary, each Kannada consonant may beencoded such that it denotes a live consonant, adead consonant or a conjunct consonant glyph.

October 2002 Contents 33

Memory Representations and Rendering Order

Notation

In the next set of rules, the following notationapplies:

Cn Nominal glyph form of a consonantC as it appears in the code charts.

Cl A live consonant, depicted identicallyto Cn.

Cd Glyph depicting the dead consonantform of a consonant C.

Ch Glyph depicting the half-consonantform of a consonant C.

Ln Nominal glyph form of a conjunctligature consisting of two or morecomponent consonants. A conjunctligature composed of two consonantsX and Y is also denoted by X.Yn.

RAsub A non-spacing combining mark glyphform positioned below the base glyphform.

Vvs Glyph depicting the dependent vowelsign form of a vowel V.

HALANTn The nominal glyph form non-spacingcombining mark depicting 0CCDKannada sign Halant.

A halant character is not always depicted; when itis depicted, it adopts this non-spacing mark form.

Memory Representations and Rendering Order

The order for storage of plain text in Kannadagenerally follows the phonetic order, that is, a CVsyllable with a dependant vowel is always encodedas a consonant letter C followed by a vowel sign Vin the memory representation. This order isemployed by the ISCII standard and correspondswith phonetic and keying order of textual data.Unlike Devanagari and some other Indian Scripts,all the dependent vowels in Kannada are depictedto the right of their consonant letters. Hence thereis no need to reorder the elements in mapping fromthe logical (character) store to the presentation(glyph) rendering and vice versa.

Rule R1 : Whenever a consonant is followed by avowel, then the corresponding vowel sign attachesto the consonant suitably.

Character order Glyph orderKA

n+ U → KA

n+ U

vs N® +H → PÀÆ

Further, Kannada script does not allow half-consonants, ligatures and half ligature forms. Thefollowing provides more formal and complete rulesfor minimal rendering of Kannada as part of a plaintext sequence. It describes the mapping betweenUnicode characters and the glyphs in a Kannadafont. It also describes the combining and orderingof those glyphs.

The rules provide minimal requirements for legiblyrendering Kannada text. As with any script, a morecomplex procedure can add rendering characteristics,depending on the font and application.

Dead Consonant Rule

The following rule logically precedes theapplication of any other rule to form a deadconsonant. Once formed, a dead consonant maybe subject to other rules described next.

Rule R2 : When a consonant Cn precedes a

HALANTn, it is considered to be a dead consonantC

d. A consonant C

n that does not precede

HALANTn is considered to be live consonant Cl.

KAn

+ HALANTn

→ KAd

PÀ + ï → PïConsonant cluster (conjunct) rendering

As already explained in section 8, the conjunctformation (consonant cluster) with two or moreconsonants and a terminal vowel is as follows:

A. The first consonant of the consonant cluster isrendered with the terminal vowel.

B. The remaining consonants (in between the firstconsonant and the terminal vowel) are renderedin conjunt consonant glyph forms in thephonetic order.

Rule R3 : Example1:

KAd

+ KAn

→ KAh

Pï + P À → PÀÌ(conjunct consonant glyph of f to Ì )Example 2 :

SAd+ TH

d+ RA

d+ I

vs® SAI

vs+ TH

h+ RA

h¸ï + v ï + gï+  à ® ¹ÛçÃ( Û and æ are the conjunct consonant glyphs ofvÀ and gÀ)

Contents October 200234

Consonant Clusters with two different displayforms : Consonant RA Rules

Whenever a consonant cluster of two or moreconsonants is formed with the Kannada consonantletter RA ( (gÀ, 0CB0) as the first component of theconsonant cluster, the component of this letter RA isdepicted with two different presentation forms: oneas the initial and the other as the final display elementof the consonant cluster as detailed below.

Consonant clusters with RA as the first consonant:general method of rendering

Rule R4 : As explained before, the character gÀ isrendered with the terminal vowel (implicit ordependent) and the in-between consonants arerendered below and/or to the right of gÀ, in conjunctconsonant glyph forms (gÀÌ, gÀÎ etc.).

Example 1:RA

d+ KA

l® RA

l+ KA

hgï + PÀ ® gÀÌ

Example 2 :RA

d+ MA

l+ U

vs® RA

n+ MA

h+ U

vsgï + ÈÚß + Ä ® gÀÄä

Example 3 :RA

d+ TA

d+ YA

n® RA

n+ TA

h+ YA

hgï + vï + AiÀÄ ® gÀÛöå

Consonant clusters with RA as the first consonant:Alternate method of rendering

Rule R5 : In the alternate representation methodalso, the above procedure is followed assuming gÀis absent (which means that the conjunct formationstarts from the second consonant) to obtain theconsonant cluster (conjunct). This is followed byanother distinct glyph ð for gï and this new glyphis depicted to the extreme right of the conjunctformed above. As per this representation, theconjuncts gÀÌ, gÀÄä and gÀÛöå rendered in examples1, 2 and 3 above are rendered as PÀð, ªÀÄÄð andvÀåð. The corresponding rule is as follows:

Example 1:RA

d+ KA

l® KA

l+ Arkavottu

gï + PÀ ® PÀðExample 2:RA

d+ MA

l+ U

vs® MA

n+ U

vs+ Arkavottu

gï + ÈÚß + Ä ® ªÀÄÄðExample 3:RA

d+ TA

d+ YA

n® TA

n+ YA

h+ Arkavottu

gï + vï + AiÀÄ ® vÀåð

Exception for the alternate method

Rule R6 : The exception for the rule R4 is that,whenever a conjunct is formed with both the firstand second consonants as gÀ (RA) (ie. a consonantconjunct using gÀ with gÀ itself, the rule R5 willnot hold good. Instead, the general method ofconsonant conjunct formation is used (Rule R4).This means the conjunct consonant glyph æ of gÀis rendered.RA

d+ RA

l+ O ®RA7

n+ RA

h+ O

vsgï + gÀ + N ® gÉÆæÃ

Nukta- Modifier Mark Rules

In addition to the vowel signs, one more type ofcombining mark may be applied to a componentof an orthographic syllable or the syllable as a whole.The NUKTA sign, which modifies a consonantform, is placed immediately after the consonant(after the terminating vowel in case of a dependentvowel appearing after the consonant) in thememory representation and is attached to thatconsonant in rendering. If the consonant representsa dead consonant, then the nukta should precedehalant in the memory representation. The nukta isrepresented by a double-dot mark placed at thelocation 0CBC. Two such modified consonants usedin Kannada are cÀ (Pronounced as ZA) and �®(Pronounced as FA).

DiacriticsDiacritics are the principle class of non-spacingcombining characters used with the Indian scripts.Diacritic is defined very broadly to include accentsas well as other non-spacing marks. Kannada has anumber of combining marks that could beconsidered diacritic. A set of five combining marksUdattha (

| above the character), Anudattha ( _ belowthe character), Guru (

— above the character), Laghu

( æ above the character) and Deergha Swaritha (||

above the character) located at 0CD1, 0CD2,0CD3, 0CD4 and 0CF9 respectively. These areused in the transcription of Sanskrit texts (whereever needed) and for Kannada grammaticalnotations.

Digits

As in many Indian languages, Kannada also has adistinct set of appropriate digits. These are beingused widely in ordinary texts, Government andpublic places. These are enumerated with codenumbers 0CE6 to 0CEF.

October 2002 Contents 35

Part-2

Sorting issues in Kannada

The sorting sequence for Kannada in Unicode isas per the collation chart enclosed with thisdocument. However, the following are someimportant issues, which have to be addressedseparately for proper sorting of data in Kannada.

ISCII – 91 provides direct sorting through its codes.It is the natural sorting method just based on codevalues. There are no special algorithms for languagespecific issues for sorting the data. This results innon-conventional sorting in some specific cases.The scholars in Kannada have specified the sortingstandards in Kannada. These standards are beingfollowed in all dictionaries and other documentsin Kannada. With this in view, the following fourspecial cases have been identified.

Sorting of Nukta characters

The modifying mark or Nukta located at 0CBCand included in the collation table is enough totake care of the sorting issues of characters cÀ(modified c) and �® (modified y®). It also takescare of any other consonant, which may bemodified using Nukta.

Sorting the data records containing anuswara andvisarga

Sorting a data set containing words terminatingwith anuswara, Visraga together with other words.In such cases, words without terminating dependentvowels are placed in wrong positions.

• Sorting sequence as per the Unicode is accordingto the specified standards if the anuswara andvisarga appear within a word.

Sorting of words with dead consonants

• Sorting of words terminating with deadconsonants

Sorting in this case also violates the sorting rulesof Kannada. The Unicode sorting places the word

terminating with the dead consonant at the endof the list. The following list compares the sortingof a sample data using Unicode table and theacceptable sorting for this case.

Sorted data as per Unicode Acceptable sorting�¯N® �¯N¬�¯N¬ �¯N®�¯S® �¯S¬

�¯Sµ²° �¯S®�¯S¬ �¯Sµ²°

• Dead consonants within wordsProper sorting of data with such words can beachieved by using the invisible zero widthconsonant just after the dead consonant.

To circumvent unacceptable situationsmentioned in sections 2.2 and 2.3 above, theUnicode Standard character 200C (Zero WidthNon-Joiner) can be used appropriately in the pre-processor and collation algorithms.

Sorting of Conjuncts having two different displayforms

Two such conjuncts are rendered in Kannada atpresent.

• Conjuncts with �® (0CB0) as the first consonant

This has been explained at an earlier section asConsonant Ra rules.

Words containing both the display forms of thesame consonant cluster with �® (0CB0) as thefirst consonant of the cluster has to be sorted asfollows. Even though the display rendering aredifferent, both are identical in all respects. It istherefore natural that they should appear atconsecutive positions. Even though a separateglyph and a corresponding glyph code arepresent in the display/storage codes, such anarrangement in Unicode will not render forproper sorting.

The only alternative is to represent both thedisplay forms by the same set of codes with a

Contents October 200236

distinguishing code (0CF5) within the string forthe second display form. In Unicode form, thedistinguishing code value within the string ofthe consonant cluster for the second displayform is to be considered as ignorable for thepurpose of sorting (Ref. ImplementationGuidelines, Section 5.17 of Unicode StandardVersion 3 document). This can be achievedthrough preprocessing software, with specificfunctions to generate proper glyph codes, storagecodes, and the Unicode at different levels. Sucha situation-specific code representationguarantees proper sorting of data containingconsonant clusters with two different displayforms by ignoring the code 0CF5 for Á. Thiscondition has to be incorporated at theappropriate place in the sorting algorithm.

• The second case of rendering a same characterin two different display forms is the deadconsonant w¬. It is also written in a secondform as «. Sorting issue in regard to this caseis also dealt with the same way as in theprevious case.

The Zero Width Non-Joiner at 200C cannot beused instead of Á (0CF5), as the same sequence ofcharacters appear both with Zero Width Non-joinerand with Á, the two sequences representing twodifferent syllables (conjuncts).

Sorting of Diacritic characters

Diacritic characters formed using symbols locatedat 0CD1, 0CD2, 0CD3 0CD4 and 0CF9 to renderaccents to consonants, are considered to beequivalent to the corresponding consonants forsorting purposes and hence the above procedurecan be adopted in such cases also.

Conclusion

The sorting issues mentioned above may havemultiple solutions. Similar issues might have beensolved by different methods in respect of otherIndian languages. Hence, it is desirable to evolveuniform procedures for issues common to all the

Indian languages. However, solutions for sortingproblems mentioned here with respect Kannadahave been obtained by considering all theconsonants from 0C95 to 0CB9 and the consonant0CDE when they appear independently in a datafield as pure consonants (i.e. as two part coded[Ex: 0C95 º (0C95, 0CBB)] ). The sorting of adata field is achieved by the indexing method. Allthese can be elaborated to give the actual algorithmsand flow charts, if need be.

Acknowledgements

Acknowledgements are due from Directorate ofInformation Technology, Govt. of Karnataka, tothe following persons who have taken theresponsibility in arriving at the Unicode standardand prepared this document.

• Mr. C V Srinatha Sastry, Assistant Director,National Aerospace Laboratories, Bangalore 560017, General Secretary, Kannada GanakaParishath, Bangalore 560 019 and Member,Technical Advisory Committee onSatndardisation and Usage of Kannada onComputers, Government of Karnataka.

• Dr. U B Pavanaja, CEO, Visva Kannada Softec,Bangalore and Member, Technical AdvisoryCommittee on Standardisation and Usage ofKannada on Computers, Government ofKarnataka and Member, Kannada GanakaParishath, Bangalore.

• Mr. G N Narasimha Murthy, Secretary, KannadaGanaka Parishath, Bangalore and Manager, StateBank of India.

• Prof. G Venkatasubbiah, Former President,Kannada Sahithya Parishath and FormerProfessor, Vijaya College, Bangalore.

• Prof M H Krishnaiah, Former Professor,Bangalore University

• Prof. Narahalli Balasubrahmanya, Professor,Bangalore University.

October 2002 Contents 37

8.1.3 Typical Colloquial Sentencesin Kannada

GREETING

w Hellow®î®±�¯Ê�®Namaskara¦d«d±I¶dT

w Good Morningý®±�® î®±±ºb¯wµ, ý®±�µ²°u®�®±Shubha munjaane¯dgªd «dga¡dd¦dy

w Good Afternooný®±�® î®±u¯ã�®ÝShubha Madhyanha¯dgªd «dØd¦Uµ

w Good Nightý®±�® �¯räShubha Raathri¯dgªd TdeÎd

w Good Byeý®±�® �¯�µ¶Nµ�®± ¯q®±�S®±l¬ �µ¶Shubha Haaraikeya maathu-Good bye¯dgªd UµdTzIy¶Sd «dd£dg-�djNµ ©dz

w Thanksu®w®ãî¯u®S®¡®±Dhanyavaadhagalu¥d¦Sd®ddQ�dVgµ

w How are youx°î®¼ �µ°TvÛ°�/�µ°TvÛ°�®±neevu Hegiddeeri/Hegidhdheeya¦df®dg Uîµe�dÔfeT/Uîµe�dÔfSd

w I am fine thank youw¯w®± Xµw¯ÝTuµÛ°wµ. u®w®ãî¯u®S®¡®±Naanuchennagiddene.Dhanyavaadhagalu¦dd¦dg �dyêdde�dÔî¦dy. ¥d¦Sd®ddQ�dVgµ

w Sorryy®ý¯Ïq¯Ùy®y®l®±/£®ï±�pashchyaathaapapadu/kshamisi

R¶ÜSdd£dd§d §dNgµGREETING

w It is coldX®¢ Cuµchali idhe

�deVµ BQy

Column 1 Column 2 Column 3 Column 4 Column 5

0C82 0CCD 0C96 0CA6 0CB9

º ¬ Q u® �®0C83 OCBB 0C97 0CA7 0CB3

» À S® u® ¡®0C85 0CBE 0C98 0CA8 0CB4A ¯ U® w® ¦

0C86 0CBF 0C99 0CAA 0CBC

B ¸ W y® . .0C87 0CC0 0C9A 0CAB 0C88

C µ° X® y® D0CC1 0C9B 0CAC 0C89 0CC2

± Z® � E ²0C9C 0CAD 0C8A 0CC3 0C9D

c �® F ³ �®Ä±0CAE 0C8B 0CC4 0C9E 0CAFî®± G± ´ f �®±

0CE0 0CC6 0C9F 0CB0 0C8EG² µ h �® H

0CC7 0CA0 0CB1 0C8F 0CC8 µ° j® ¥ I µ¶

0CA1 0CB2 0C90 0CCA 0CA2l® © J µ² l®

0CB5 0C92 0CCB 0CA3 0CB6î® K µ²° o ý®

0C93 0CCC 0CA4 0CB7 0C94L ¹ q® Ç® M

0C95 0CA5 0CB8N® s® �®

Collating sequence of Kannada UnicodeCharacters.

(Courtesy : Shri C. V. Srinath SastryDirectorate of Information Technology

Government of Karnataka,Bangalore (Karnataka)

Phone: 080-5279611(O), 6645865(R)E-mail : [email protected])

Contents October 200238

w I like Bengali sweets

w¯w®± �ºS¯ª ��rºm�®±w®±Ý CÇ®Ôy®l®±qµÙ°wµ.Naanu Bangaali Thindiyannu ishtapaduthene¦dd¦dg ©da�dde¬d e±deUµ e£daeNµSdêdg BݧdNgµÏdî¦dy

w I love birdsw¯w®± y®¤S®¡®w®±Ý CÇ®Ôy®l®±qµÙ°wµNaanu Pakshigalannu Ishtapaduthene¦dd¦dg §de´d�dVµêdg BݧdNgµÏdî¦dy

w Where is Railway station?�µ¶�µæ x�¯Ûo Hªåuµ?(ET�ºm x�¯Ûo Hªåuµ?)Railve NiladhaanaEllidhe?(ugibandi Niladaanaellidhe?)Tz¬®dy e¦d¬QdPd He¬¬dQy? (De�d©daeNµ e¦d¬dQdPd He¬¬dQy?)

w How far is the Bus Terminal from here?��éw® Nµ²wµ�®± x�¯Ûo Cªåºu® HÇ®±Ô u®²�®ïuµ?Bassina koneya niladhaana illimdha Eshtudhooravidhe?©de±±d¦d I¶dy¦dySd e¦d¬QdPd Be¬¬daQ HÝi QjTe®dQy?

w How long will it take to reach the Airport?ï¯w® x�¯Ûoî®w®±Ý q®©±y®©± HÇ®±Ô �®î®±�®± �µ°N¯S®±q®Ùuµ?Vimaana Niladhaanavannu thalupalu Eshtusamaya Bekaaguthadhe?e®d«dd¦d e¦d¬QdPd®dêdg £d¬dg§d¬dg HÝi ±d«dSd ©dîI¶d�dgÏdQy?

w Is Mr. Raghunath there?Aªå §° �®U®±w¯s¬ Cu¯Û�µ�µ±°?Alli Shree Raghunaath idhaareye?Ae¬¬d Údf T�dg¦dd¤d BÔdTySdy

w Please tell him to call back as soon as he is freeAî®xSµ/Aî®�Sµ �l®±î¯u® q®£®o N®�µ ¯l®©± u®�®±ïh±Ô�µ°¡®±/�µ°¡®±Avanige/Avarige biduvaadha thakshana karemaadalu Hlu/HeliA®de¦d�dy/A®deT�dy e©dNgµ®ddQ £d´dPd I¶Ty «ddNµ¬dg Qsde®dÅhµ UîµVgµ/UîµeVµ

w How much will it cost?Cu®�® �µ�µ HÇ®±Ô?Idhara bele Eshtu?BQT ©dy¬dy HÝi?

w Excuse mew®w®Ýw®±Ý £®ï±�Nannannu Kshamisi

¦dêdêdg ´de«de±d

w It is cool outside

�µ²�®S®lµ q®ºy¯Tuµ.Horagade ThampaagidheUµdyT�dNyµ £da§dde�dQy

w It is hot��©±/�µ°�®Sµ/u®Sµ CuµBisilu/Besige/Thampue©de±d¬dg/ ©dîe±d�dy/¥d�dy

w It is rainingî®±¡µ ��®±rÙuµ.Male Baruthidhe«dVyµ©dèeÏdQy

GENERALw What is your name?

xw®Ý/xî®±â �µ�®�µ°w®±?Ninna/Nimma Hesarenu?e¦dêd/e¦d««d Uyµ±dTî¦dg?

w My name is Ranjanw®w®Ý �µ�®�®± �®ºcw¬Nanna Hesaru Ranjan¦dêd Uyµ±dè Ta¡d¦d

w Where do you live?x°w®±/ x°î®¼ Hªå î¯�®î¯TvÛ°�®±/î¯�®î¯TvÛ°�?Neenu/Neevu elli Vaasavaagiddeeya/Ri?¦df¦dg/¦df®dg He¬¬d ®dd±d®dde�dÔfSd/eT

w I live near Ghantagharw¯w®± U®ºhU®�¬w® �¢ î¯�®î¯TuµÛ°wµNaanu Gantagarna Bali Vaasa Vaagiddene¦dd¦dg OµdaLµ�d¦d� ©deVµ ®dd±d®dde�dÔî¦dy

w How old are you?xî®±â î®�®±�®±é HÇ®±Ô?Nimma Vayassu Eshtu?e¦d««d ®dSd±±dg Hi?

w That building is tallB N®hÔl® Hq®Ù�®î¯TuµAa Kattada etharavaagidheAd I¶ÅµNµ HÏLµT®dde�dQy.

w She is beautifulA®± �®±ºu®�®î¯Tu¯Û¡µAvalu Sundaravaadiddaale

A®dVgµ ±dgaQT®dde�dÔdVyµ

October 2002 Contents 39

w From which Platform can I get the train for

Chandigarh?

�®¾¯î® y¯åg¬ y¯ß�®ºxºu® w®w®Sµ X®ºmU®�¬Sµ �µ²°S®©±�µ¶©± �S®±q®Ùuµ?Yava plat formnimdha nanage Changarge

Hogalu railu siguthadheye?

Sdd®d §¬ddLµ R¶d«d� e¦daQ ¦d¦d�dy �daeNµ�dT �dy Uµdî�d¬dg Tz¬dg e±d�dgÏdQy?w Does this train stop at Aligarh?

D �µ¶©± AªU®�®u®ªå x©±åq®Ùuµ�µ±°?Ee railu Aligharadhalli nilluthadheye?

C Tz¬dg Ae¬d�dTQe¬¬d e¦d¬¬dgÏdQySdy?w How many kids do you have?

xw®Sµ/xî®±Sµ HÇ®±Ô î®±ºv î®±N®Ê¡®± Cu¯Û�µ?Nimage/Ninage Eshtu Mandhi makkalu

iddare?

e¦d¦d�dy/e¦d«d�dy HÝi «daeQ «d�I¶Vgµ BÔdTy?w This gift is wonderful

D El®±Sµ²�µ Au®±áq®î¯Tuµ.E udugore adhbuthavaagidhe.

C DNgµ�ddyTy AÖ£dµ®dde�dQyw It is really pretty

Au®± xcî¯S®©² �®±ºu®�®î¯TuµAdhu Nijavaagalu Sundharavaagidhe.

AQg e¦d¡d®dd�d¬dg ±dgaQT®dde�dQy.w Food is delicious

Fh/rºm �®±YN®�®î¯TuµUta/Thindi Ruchikaravaagidhe

ELµ e£daeNµ èe�dI¶T ®dde�dQyw Congratulations

A�Ãw®ºu®wµS®¡®±Abhinandhanegalu

Aeªd¦daQ¦dy�dVgµw You look lovely

x°w®± î®±±u¯ÛT N¯p�®±rÙvÛ°�®±/ x°î®¼ î®±±u¯ÛTN¯p�®±rÙvÛ°�®Neenu Mudhaagi Kaanisuthiddeeya/

Kaanisuthiddeera

¦df¦dg «dgÔde�d I¶dePd±dgeÏdÔfSd/¦df®dg «dgÔde�d I¶dePd±dgeÏdÔfT

w Wish you happy new year

xî®±Sµ/xw®Sµ �µ²�®î®Ç®Áu® ý®±�¯ý®�®±S®¡®±Nimage/Ninage Hosa varshadha

Shubhaashayagalu

e¦d«d�dy/e¦d¦d�dy Uµdy±d ®d¯d�Q ¯dgªdd°dSd�dVgµw I wish you all the happiness

w¯w®± xî®±Sµ y®½oÁ�®ºqµ²°Ç®î®w®±Ý ��®±�®±qµÙ°wµNaanu Nimage Poornasanthoshavannubayasuthene.¦dd¦dg e¦d«d�dy §djPd� ±da£ddî°d®dêdg ©dSd±dgÏdî¦dy

w Congratulations on your marriagexî®±â/xw®Ý î®±u®±îµSµ A�Ãw®ºu®wµS®¡®±Nimma/Ninna Madhuvege Abhinandhanegalu

e¦d««d/e¦dêd «d¥dg®dy�dy Aeªd¦daQ¦dy�dVgµw Keep your eyes wide open before marriage and

half- shut afterwardsxw®Ý N®o±ØS®¡®w®±Ý î®±u®±îµSµ î®±±w®Ý y®½rÁ�®¾¯T�®±² Aw®ºq®�®Au®Á qµ�µvh±ÔNµ² xî®±â N®o±ØS®¡®w®±Ý î®±u®±îµSµ î®±±w®Ýy®½rÁ�®¾¯T�®±² Aw®ºq®�® Au®Á qµ�µvh±ÔNµ²¢ëNimma Kannugalannu Madhuvege Munnapoorthiyaagiyoo Ananthara Ardhavootheredhittukollie¦dêd I¶PPdg�dVµêdg «d¥dg®dy�dy «dgêd §dje£d�Sdde�dSdj A¦da£dT A¥d�®dj£dy�dyeQÅhµI¶dy e¦d««d I¶PPdg�dVµêdg «d¥dg®dy�dy «dgêd §dje£d�Sdde�dSdj A¦da£dTA¥d�®dj £dy�dyeQÅhµI¶dye³Vµ

(Courtesy : Shri C. V. Srinath SastryDirectorate of Information Technology

Government of Karnataka,Bangalore (Karnataka)

Phone: 080-5279611(O), 6645865(R)E-mail : [email protected])

Contents October 200240


Recommended