8.1.1 Kannada Code Chart Details
Code Character Description
Point
Various signs
0C82 #M KANNADA SIGN ANUSVARA
0C83 #N KANNADA SIGN VISARGA
Independent vowels
0C85 @ KANNADA LETTER A
0C86 A KANNADA LETTER AA
0C87 B KANNADA LETTER I
0C88 C KANNADA LETTER II
0C89 D KANNADA LETTER U
0C8A E KANNADA LETTER UU
0C8B Fß KANNADA LETTER
VOCALIC R
0C8C � KANNADA LETTER
VOCALIC L
• Not in present use
0C8D <reserved>
0C8E G KANNADA LETTER E
0C8F H KANNADA LETTER EE
0C90 I KANNADA LETTER AI
0C91 <reserved>
0C92 J KANNADA LETTER O
0C93 K KANNADA LETTER OO
0C94 L KANNADA LETTER AU
Consonants
0C95 OÚ KANNADA LETTER KA
0C96 R KANNADA LETTER KHA
0C97 VÚ KANNADA LETTER GA
0C98 YÚ KANNADA LETTER GHA
0C99 \ KANNADA LETTER NGA
0C9A Ú̂ KANNADA LETTER CA
0C9B aÚ KANNADA LETTER CHA
0C9C d KANNADA LETTER JA
0C9D ÁÚhß KANNADA LETTER JHA
0C9E j KANNADA LETTER NYA
0C9F l KANNADA LETTER TTA
0CA0 pÚ KANNADA LETTER TTHA
0CA1 sÚ KANNADA LETTER DDA
0CA2 vÚ KANNADA LETTER DDHA
0CA3 y KANNADA LETTER NNA
0CA4 }Ú KANNADA LETTER TA
0CA5 ¢Ú KANNADA LETTER THA
0CA6 ¥Ú KANNADA LETTER DA
0CA7 Ú̈ KANNADA LETTER DHA
0CA8 «Ú KANNADA LETTER NA
0CA9 <reserved>
0CAA ®Ú KANNADA LETTER PA
0CAB ±Ú KANNADA LETTER PHA
0CAC ¶ KANNADA LETTER BA
0CAD ºÚ KANNADA LETTER BHA
0CAE ÈÚß KANNADA LETTER MA
0CAF ¾Úß KANNADA LETTER YA
0CB0 ÁÚ KANNADA LETTER RA
0CB1 � KANNADA LETTER RRA
0CB2 Ä KANNADA LETTER LA
0CB3 ×Ú KANNADA LETTER LLA
0CB4 <reserved>
0CB5 ÈÚ KANNADA LETTER VA
0CB6 ËÚ KANNADA LETTER SHA
0CB7 ÎÚ KANNADA LETTER SSA
0CB8 ÑÚ KANNADA LETTER SA
0CB9 ÔÚ KANNADA LETTER HA
0CBA # KANNADA INVISIBLE
LETTER
0CBB #Ú KANNADA VOWEL SIGN A
0CBC #Ã Ã KANNADA SIGN NUKTA
0CBD % KANANADA SIGN AVAGRAHA
October 2002 Contents 29
Dependent vowel signs
0CBE #Û KANNADA VOWEL SIGN AA
0CBF #Ý KANNADA VOWEL SIGN I
0CC0 #ÝÞ KANNADA VOWEL SIGN II
0CC1 #ß KANNADA VOWEL SIGN U
0CC2 #à KANNADA VOWEL SIGN UU
0CC3 #ä KANNADA VOWEL
SIGN VOCALIC R
0CC4 � KANNADA VOWEL
SIGN VOCALIC RR
0CC5 <reserved>
0CC6 #æ KANNADA VOWEL SIGN E
0CC7 #æÞ KANNADA VOWEL SIGN EE
0CC8 #æç KANNADA VOWEL SIGN AI
0CC9 <reserved>
0CCA #æà KANNADA VOWEL SIGN O
0CCB #æàÞ KANNADA VOWEL SIGN OO
0CCC #è KANNADA VOWEL SIGN AU
Various signs
0CCD #é KANNADA SIGN HALANT
0CCE <reserved>
0CCF <reserved>
0CD0 <reserved>
0CD1 � KANNADA DIACRITIC
SIGN UDATTA
•Used above any character
0CD2 � KANNADA DIACRITIC
SIGN ANUDATTA
•Used below any character
0CD3 � KANNADA DIACRITIC
SIGN GURU-GRAVE
•Used above any character
0CD4 � KANNADA DIACRITIC
SIGN LAGHU-ACUTE
•Used above any character
0CD5 � KANNADA LENGTH MARK
0CD6 � KANNADA AI LENGTH
MARK
Additional consonants
0CDE � KANNADA LETTER LLLA
Generic additions
0CE0 Fà KANNADA LETTER
VOCALIC RR
0CE1 � KANNADA LETTER
VOCALIC LL
•Not in present use
Digits
0CE6 0 KANNADA DIGIT ZERO
0CE7 1 KANNADA DIGIT ONE
0CE8 2 KANNADA DIGIT TWO
0CE9 3 KANNADA DIGIT THREE
0CEA 4 KANNADA DIGIT FOUR
0CEB 5 KANNADA DIGIT FIVE
0CEC 6 KANNADA DIGIT SIX
0CED 7 KANNADA DIGIT SEVEN
0CEE 8 KANNADA DIGIT EIGHT
0CEF 9 KANNADA DIGIT NINE
0CF5 % KANNADA SIGN REPH
0CF9 � KANNADA DIACRITIC
SIGN DEERGHA SWARITHA
•Used above any character
Contents October 200230
8.1.2 Kannada General Information& Description
Introduction
The Kannada script is a South Indian script. It isused to write Kannada language of Karnataka Statein India. This is also used in many parts of TamilNadu, Kerala, Andhra Pradesh and Maharashtra.In addition, the Kannada script is also used to writeTulu, Konkani and Kodava languages. Kannada alongwith other Indian language scripts shares a largenumber of structural features. The Kannada block ofUnicode Standard (0C80 to 0CFF) is based on ISCII-1988 (Indian Standard Code for InformationInterchange). The Unicode Standard (version 3)encodes Kannada characters in the same relativepositions as those coded in the ISCII-1988 standard.
The Writing system that employs Kannada scriptconstitutes a cross between syllabic writing systemsand phonemic writing systems (alphabets). Theeffective unit of writing Kannada is the orthographicsyllable consisting of a consonant and vowel (CV)core and optionally, one or more precedingconsonants, with a canonical structure of ((C)C)CV.The orthographic syllable need not correspondexactly with a phonological syllable, especially whena consonant cluster is involved, but the writingsystem is built on phonological principles and tendsto correspond quite closely to pronunciation.
The orthographic syllable is built up of alphabeticpieces, the actual letters of Kannada script. Theseconsist of distinct character types: Consonantletters, independent vowels and the correspondingdependent vowel signs. In a text sequence, thesecharacters are stored in logical phonetic order.
Rendering Kannada Characters
Kannada characters can combine or change shapedepending on their context. A character’sappearance is affected by its ordering with respectto other characters and the application or systemenvironment. This variation can cause theappearance of Kannada characters to be differentfrom nominal glyphs.
Vowels (Swaras)
Independent vowel lettersThe independent vowels (Swaras) in Kannada areletters that stand on their own. The writing systemtreats independent vowels as orthographic CVsyllables in which the consonant is null. The
independent vowel letters are used to write syllables,which start with a vowel. The Unicode characterencoding for Kannada uses a distinct set of namingconventions for some mid vowels of the fourteenvowels in Kannada. Of these fourteen vowels,twelve vowels have been divided into six sets, eachset consisting of a short vowel (Hrasva Swara),followed by a corresponding long vowel (DeergaSwara). These are two types of Swaras are dependingon the time used to pronounce them.
Hrasva Swara is a freely existing independent vowelwhich can be pronounced in a single matra time(matra kala) whereas a Deergha Swara is the vowelwhich can be pronounced in two matra.time. Thesix sets of the swaras are :
C, D (0C85 , 0C86)E, F (0C87 , 0C88)G, H (0C89 , 0C8A)IÄ, IÆ (0C8B , 0CE0)J, K (0C8E , 0C8F)M, N (0C92 , 0C93)
Of these, the vowel IÆ(0CE0) is not in present use.
The two Deergha swaras L(0C90) and O(0C94)have no Hrasva swara counterparts.
Further, the so-called swaras with code values 0C8Cand 0CE1 are not used in Kannada and are notrequired for Kannada.
Dependent vowel signs (Matras)
The dependent vowel signs serve as the commonmanner of writing non-inherent vowels and aregenerally referred to as Swara Chinhas in Kannadaor Matras in Samskrit. The dependent vowel signsdo not appear stand-alone; rather, they are visiblydepicted in combination with a base-letter form(generally a consonant). A single consonant or aconsonant cluster may have a dependent vowel signapplied to it to indicate the vowel quality of thesyllable, when it is different from the inherentvowel. Explicit appearance of a dependent vowelsign in a syllable overrides the inherent vowelC(0C85) of a single consonant letter.
There are several variations with which thedependent vowels are applied to the baseletterforms. Most of them appear as non-spacingdependent vowel signs when applied to baseletterforms; above or to the right side of a consonantletter or a consonant cluster. The following are theexceptions and variations for the above rule:
October 2002 Contents 31
A. The two dependent vowel signs È(0CC3) &ñ(0CC4) appear one level below and to theright of the consonant or the consonant cluster,separated by a small white space.
B. Each of the five dependent vowels signs ÂÃ(U+0CC0), ÉÃ(0CC7), ÉÊ(0CC8), ÉÆ(0CCA) &ÉÆÃ (0CCB) are depicted by two or three glyphcomponents (two part or three part vowel signs)with one component appearing with a spaceto the right of the consonant or the consonantcluster.
i) In the case of three of the above-mentionedtwo/three-part dependent vowels Âà (0CC0), ÉÃ(0CC7) and ÉÆÃ(0CCB), the non-spacingcomponent(s) of each of them is(are) the sameas the vowel sign(s) of the correspondingpreceding short vowels. The spacingcomponent for each of these dependent vowelsis the same length mark à (0CD5) given inUnicode version 3. The logic for this is thatthese dependent vowels are nothing but the longforms (independent and phonetically distinct)of the preceding short vowels.
ii) The first component of the dependent vowelÉÊ (0CC8) mentioned above is the same as thedependent vowel É (0CC6) and the secondcomponent is same as Ê (0CD6). These aredefined independently in Unicode version 3.The second part appears slightly below and tothe right of the consonant or the consonantclusters.
C. In view of this, it is important to note that thetwo glyphs (the length mark à and the secondcomponent of ÉÊ i.e. Ê) represented with thecodes at 0CD5 and 0CD6 in Unicode version3 have no independent existence and do notplay any part as independent codes in thecollation algorithm.
D. Unlike Devanagari, the Kannada script does nothave any character with a left-side dependentvowel sign.
E. A one-to-one correspondence exists betweenindependent vowels and dependent vowel signs.
Consonant letters (Vyanjanas)
Each of the 36 consonant letters in Kannada(enumerated with codes 0C95 to 0CB9 and 0CDE)represents a single consonantal sound but also has
the peculiarity of having an inherent vowel,generally the short vowel C (/a/ 0C85).
Thus the Kannada letter at 0C95 represents notjust Pï (K) but PÀ (KA) with the inherent vowelC(0C85). In the presence of a dependent vowel,however, this inherent vowel associated with aconsonant letter is overridden by the dependentvowel. The consonants ¾ (0CB1) and ¿(0CDE)sound similar to g À (0CB0) and ¼ À (0CB3)respectively. These two appear in ancient Kannadatexts but are not in present use. With this,consonants in modern Kannada are 34 in number(without ¾ and ¿). These are classified as VargeeyaVyanjanas (0C95 to 0CAE) and Avargeeya Vyanjanas(0CAF, 0CB0 and 0CB2 to 0CB9).
Vargeeya Vyanjanas : The five sets of VargeeyaVyanjanas arePÀ R UÀ WÀ Y0C95 0C96 0C97 0C98 0C99
ZÀ bÀ d gÀhÄ k0C9A 0C9B 0C9C 0C9D 0C9El oÀ qÀ qsÀ t0C9F 0CA0 0CA1 0CA2 0CA3vÀ xÀ zÀ zsÀ £À0CA4 0CA5 0CA6 0CA7 0CA8
¥À ¥sÀ § ¨sÀ ªÀÄ0CAA 0CAB 0CAC 0CAD 0CAE
Avargeeya Vyanjanas : The nine AvargeeyaVyanjanas (enumerated in the acceptable sortingorder) are:AiÀÄ gÀ ® ªÀ ±À µÀ ¸À ºÀ ¼À0CAF 0CB0 0CB2 0CB5 0CB6 0CB7 0CB8 0CB9 0CB3
Halant
Like Devanagari, Kannada script also employs asign known as halant or vowel omission sign. Ahalant sign ( ¬ , 0CCD) nominally serves to cancel(or kill) the inherent vowel of the consonant towhich it is applied.
The halant functions as a combining character.When a consonant has lost its inherent vowel bythe application of halant, it is known as a deadconsonant. The dead consonants are thepresentation forms used to depict the consonantswithout an inherent vowel. Their rendered formsin Kannada resemble the full consonant with thevertical stem replaced by the halant sign, whichmarks a character core. The stem glyph ( ® at 0CBB)
Contents October 200232
is graphically and historically related to the signdenoting the inherent /a/ (A) vowel (0C85). Incontrast, a live consonant is a consonant that retainsits inherent vowel or is written with an explicitdependent vowel sign. The dead consonant is definedas a sequence consisting of a consonant letter followedby a halant. The default rendering for a deadconsonant is to position the halant as a combiningmark bound to the consonant letter form.
Avagraha (³)
A spacing mark ³, called avagraha sign is used whilerendering Samskrit tests. This is located at OCBD.
Encoding order
The traditional Kannada alphabetic encoding orderfor consonants follows articulatory phoneticprinciples, starting with velar consonants andmoving forward to bilabial consonants, followedby liquids and then fricatives. ISCII (Indian ScriptCode for Information Interchange) & the Unicodestandard both observe this traditional order.
Consonant conjuncts (Samyuktaksharas)
Like any other Indian script, Kannada is also notedfor a large number of consonant conjunct formsthat serve as orthographic abbreviations (ligatures)of two or more adjacent forms. This abbreviationtakes place only in the context of a consonantcluster. An orthographic consonant cluster isdefined as a sequence of characters that representone or more dead consonants (denoted by C
d)
followed by a normal live consonant (denotedby C
l).
Corresponding to each Kannada consonant, thereexists a separate and unique glyph, which is speciallyused to represent the corresponding consonant ina consonant cluster. Most of these conjunctconsonant glyphs resemble their original consonantforms (many without the implicit vowel sign,wherever applicable).
In Kannada, there is only one type of conjunctformation (consonant cluster) and it is depicted asfollows:
4The first consonant of the consonant cluster isrendered with the implicit or a differentdependant vowel appearing as the terminalelement of the consonant cluster.
4The remaining consonants (consonants inbetween the first consonant and the terminalvowel element) appear in conjunct consonantglyph forms in the phonetic order. They aregenerally depicted directly below or sometimesbelow but to the right of the first consonant.
Thus, the systematically designed Kannada scriptfont contains the conjunct glyph components, butthey are not encoded as Unicode characters, becausethey are the resultant of ligation of distinct letters.Kannada script rendering software must be able tomap appropriate combinations of characters incontext to the appropriate conjunct glyphs in fonts.
Invisible consonant INV
There is a need to have a consonant, which providesan invisible base for the display of dependant vowelswithout any consonant base. This can be theUnicode Standard Zero Width Non-Joiner at200C. This can also be used to provide propercollation of the words containing dead consonants.
Explicit Halant
Normally, a halant character serves to create deadconsonants, which, in turn, combine withsubsequent consonants in order to form conjuncts.This behavior usually results in a halant sign notbeing depicted visually. Occasionally, however, thisdefault behavior is not desired when a deadconsonant should be excluded from conjunctformation, in which case the halant sign is visiblyrendered.
In order to accomplish this, the Unicode Standardcharacter 200C (Zero Width Non-Joiner) isintroduced immediately after the encoded deadconsonant that is to be excluded from conjunctformation.
For example, the use of Zero Width Non-Joinerprevents the default formation of the conjunct form�®Ì, resulting in �¬S®.The Kannada script adopts the convention ofdepicting the character (in this case the halant sign)as appropriate for the consonant to which it isattached.
In summary, each Kannada consonant may beencoded such that it denotes a live consonant, adead consonant or a conjunct consonant glyph.
October 2002 Contents 33
Memory Representations and Rendering Order
Notation
In the next set of rules, the following notationapplies:
Cn Nominal glyph form of a consonantC as it appears in the code charts.
Cl A live consonant, depicted identicallyto Cn.
Cd Glyph depicting the dead consonantform of a consonant C.
Ch Glyph depicting the half-consonantform of a consonant C.
Ln Nominal glyph form of a conjunctligature consisting of two or morecomponent consonants. A conjunctligature composed of two consonantsX and Y is also denoted by X.Yn.
RAsub A non-spacing combining mark glyphform positioned below the base glyphform.
Vvs Glyph depicting the dependent vowelsign form of a vowel V.
HALANTn The nominal glyph form non-spacingcombining mark depicting 0CCDKannada sign Halant.
A halant character is not always depicted; when itis depicted, it adopts this non-spacing mark form.
Memory Representations and Rendering Order
The order for storage of plain text in Kannadagenerally follows the phonetic order, that is, a CVsyllable with a dependant vowel is always encodedas a consonant letter C followed by a vowel sign Vin the memory representation. This order isemployed by the ISCII standard and correspondswith phonetic and keying order of textual data.Unlike Devanagari and some other Indian Scripts,all the dependent vowels in Kannada are depictedto the right of their consonant letters. Hence thereis no need to reorder the elements in mapping fromthe logical (character) store to the presentation(glyph) rendering and vice versa.
Rule R1 : Whenever a consonant is followed by avowel, then the corresponding vowel sign attachesto the consonant suitably.
Character order Glyph orderKA
n+ U → KA
n+ U
vs N® +H → PÀÆ
Further, Kannada script does not allow half-consonants, ligatures and half ligature forms. Thefollowing provides more formal and complete rulesfor minimal rendering of Kannada as part of a plaintext sequence. It describes the mapping betweenUnicode characters and the glyphs in a Kannadafont. It also describes the combining and orderingof those glyphs.
The rules provide minimal requirements for legiblyrendering Kannada text. As with any script, a morecomplex procedure can add rendering characteristics,depending on the font and application.
Dead Consonant Rule
The following rule logically precedes theapplication of any other rule to form a deadconsonant. Once formed, a dead consonant maybe subject to other rules described next.
Rule R2 : When a consonant Cn precedes a
HALANTn, it is considered to be a dead consonantC
d. A consonant C
n that does not precede
HALANTn is considered to be live consonant Cl.
KAn
+ HALANTn
→ KAd
PÀ + ï → PïConsonant cluster (conjunct) rendering
As already explained in section 8, the conjunctformation (consonant cluster) with two or moreconsonants and a terminal vowel is as follows:
A. The first consonant of the consonant cluster isrendered with the terminal vowel.
B. The remaining consonants (in between the firstconsonant and the terminal vowel) are renderedin conjunt consonant glyph forms in thephonetic order.
Rule R3 : Example1:
KAd
+ KAn
→ KAh
Pï + P À → PÀÌ(conjunct consonant glyph of f to Ì )Example 2 :
SAd+ TH
d+ RA
d+ I
vs® SAI
vs+ TH
h+ RA
h¸ï + v ï + gï+  à ® ¹ÛçÃ( Û and æ are the conjunct consonant glyphs ofvÀ and gÀ)
Contents October 200234
Consonant Clusters with two different displayforms : Consonant RA Rules
Whenever a consonant cluster of two or moreconsonants is formed with the Kannada consonantletter RA ( (gÀ, 0CB0) as the first component of theconsonant cluster, the component of this letter RA isdepicted with two different presentation forms: oneas the initial and the other as the final display elementof the consonant cluster as detailed below.
Consonant clusters with RA as the first consonant:general method of rendering
Rule R4 : As explained before, the character gÀ isrendered with the terminal vowel (implicit ordependent) and the in-between consonants arerendered below and/or to the right of gÀ, in conjunctconsonant glyph forms (gÀÌ, gÀÎ etc.).
Example 1:RA
d+ KA
l® RA
l+ KA
hgï + PÀ ® gÀÌ
Example 2 :RA
d+ MA
l+ U
vs® RA
n+ MA
h+ U
vsgï + ÈÚß + Ä ® gÀÄä
Example 3 :RA
d+ TA
d+ YA
n® RA
n+ TA
h+ YA
hgï + vï + AiÀÄ ® gÀÛöå
Consonant clusters with RA as the first consonant:Alternate method of rendering
Rule R5 : In the alternate representation methodalso, the above procedure is followed assuming gÀis absent (which means that the conjunct formationstarts from the second consonant) to obtain theconsonant cluster (conjunct). This is followed byanother distinct glyph ð for gï and this new glyphis depicted to the extreme right of the conjunctformed above. As per this representation, theconjuncts gÀÌ, gÀÄä and gÀÛöå rendered in examples1, 2 and 3 above are rendered as PÀð, ªÀÄÄð andvÀåð. The corresponding rule is as follows:
Example 1:RA
d+ KA
l® KA
l+ Arkavottu
gï + PÀ ® PÀðExample 2:RA
d+ MA
l+ U
vs® MA
n+ U
vs+ Arkavottu
gï + ÈÚß + Ä ® ªÀÄÄðExample 3:RA
d+ TA
d+ YA
n® TA
n+ YA
h+ Arkavottu
gï + vï + AiÀÄ ® vÀåð
Exception for the alternate method
Rule R6 : The exception for the rule R4 is that,whenever a conjunct is formed with both the firstand second consonants as gÀ (RA) (ie. a consonantconjunct using gÀ with gÀ itself, the rule R5 willnot hold good. Instead, the general method ofconsonant conjunct formation is used (Rule R4).This means the conjunct consonant glyph æ of gÀis rendered.RA
d+ RA
l+ O ®RA7
n+ RA
h+ O
vsgï + gÀ + N ® gÉÆæÃ
Nukta- Modifier Mark Rules
In addition to the vowel signs, one more type ofcombining mark may be applied to a componentof an orthographic syllable or the syllable as a whole.The NUKTA sign, which modifies a consonantform, is placed immediately after the consonant(after the terminating vowel in case of a dependentvowel appearing after the consonant) in thememory representation and is attached to thatconsonant in rendering. If the consonant representsa dead consonant, then the nukta should precedehalant in the memory representation. The nukta isrepresented by a double-dot mark placed at thelocation 0CBC. Two such modified consonants usedin Kannada are cÀ (Pronounced as ZA) and �®(Pronounced as FA).
DiacriticsDiacritics are the principle class of non-spacingcombining characters used with the Indian scripts.Diacritic is defined very broadly to include accentsas well as other non-spacing marks. Kannada has anumber of combining marks that could beconsidered diacritic. A set of five combining marksUdattha (
| above the character), Anudattha ( _ belowthe character), Guru (
— above the character), Laghu
( æ above the character) and Deergha Swaritha (||
above the character) located at 0CD1, 0CD2,0CD3, 0CD4 and 0CF9 respectively. These areused in the transcription of Sanskrit texts (whereever needed) and for Kannada grammaticalnotations.
Digits
As in many Indian languages, Kannada also has adistinct set of appropriate digits. These are beingused widely in ordinary texts, Government andpublic places. These are enumerated with codenumbers 0CE6 to 0CEF.
October 2002 Contents 35
Part-2
Sorting issues in Kannada
The sorting sequence for Kannada in Unicode isas per the collation chart enclosed with thisdocument. However, the following are someimportant issues, which have to be addressedseparately for proper sorting of data in Kannada.
ISCII – 91 provides direct sorting through its codes.It is the natural sorting method just based on codevalues. There are no special algorithms for languagespecific issues for sorting the data. This results innon-conventional sorting in some specific cases.The scholars in Kannada have specified the sortingstandards in Kannada. These standards are beingfollowed in all dictionaries and other documentsin Kannada. With this in view, the following fourspecial cases have been identified.
Sorting of Nukta characters
The modifying mark or Nukta located at 0CBCand included in the collation table is enough totake care of the sorting issues of characters cÀ(modified c) and �® (modified y®). It also takescare of any other consonant, which may bemodified using Nukta.
Sorting the data records containing anuswara andvisarga
Sorting a data set containing words terminatingwith anuswara, Visraga together with other words.In such cases, words without terminating dependentvowels are placed in wrong positions.
• Sorting sequence as per the Unicode is accordingto the specified standards if the anuswara andvisarga appear within a word.
Sorting of words with dead consonants
• Sorting of words terminating with deadconsonants
Sorting in this case also violates the sorting rulesof Kannada. The Unicode sorting places the word
terminating with the dead consonant at the endof the list. The following list compares the sortingof a sample data using Unicode table and theacceptable sorting for this case.
Sorted data as per Unicode Acceptable sorting�¯N® �¯N¬�¯N¬ �¯N®�¯S® �¯S¬
�¯Sµ²° �¯S®�¯S¬ �¯Sµ²°
• Dead consonants within wordsProper sorting of data with such words can beachieved by using the invisible zero widthconsonant just after the dead consonant.
To circumvent unacceptable situationsmentioned in sections 2.2 and 2.3 above, theUnicode Standard character 200C (Zero WidthNon-Joiner) can be used appropriately in the pre-processor and collation algorithms.
Sorting of Conjuncts having two different displayforms
Two such conjuncts are rendered in Kannada atpresent.
• Conjuncts with �® (0CB0) as the first consonant
This has been explained at an earlier section asConsonant Ra rules.
Words containing both the display forms of thesame consonant cluster with �® (0CB0) as thefirst consonant of the cluster has to be sorted asfollows. Even though the display rendering aredifferent, both are identical in all respects. It istherefore natural that they should appear atconsecutive positions. Even though a separateglyph and a corresponding glyph code arepresent in the display/storage codes, such anarrangement in Unicode will not render forproper sorting.
The only alternative is to represent both thedisplay forms by the same set of codes with a
Contents October 200236
distinguishing code (0CF5) within the string forthe second display form. In Unicode form, thedistinguishing code value within the string ofthe consonant cluster for the second displayform is to be considered as ignorable for thepurpose of sorting (Ref. ImplementationGuidelines, Section 5.17 of Unicode StandardVersion 3 document). This can be achievedthrough preprocessing software, with specificfunctions to generate proper glyph codes, storagecodes, and the Unicode at different levels. Sucha situation-specific code representationguarantees proper sorting of data containingconsonant clusters with two different displayforms by ignoring the code 0CF5 for Á. Thiscondition has to be incorporated at theappropriate place in the sorting algorithm.
• The second case of rendering a same characterin two different display forms is the deadconsonant w¬. It is also written in a secondform as «. Sorting issue in regard to this caseis also dealt with the same way as in theprevious case.
The Zero Width Non-Joiner at 200C cannot beused instead of Á (0CF5), as the same sequence ofcharacters appear both with Zero Width Non-joinerand with Á, the two sequences representing twodifferent syllables (conjuncts).
Sorting of Diacritic characters
Diacritic characters formed using symbols locatedat 0CD1, 0CD2, 0CD3 0CD4 and 0CF9 to renderaccents to consonants, are considered to beequivalent to the corresponding consonants forsorting purposes and hence the above procedurecan be adopted in such cases also.
Conclusion
The sorting issues mentioned above may havemultiple solutions. Similar issues might have beensolved by different methods in respect of otherIndian languages. Hence, it is desirable to evolveuniform procedures for issues common to all the
Indian languages. However, solutions for sortingproblems mentioned here with respect Kannadahave been obtained by considering all theconsonants from 0C95 to 0CB9 and the consonant0CDE when they appear independently in a datafield as pure consonants (i.e. as two part coded[Ex: 0C95 º (0C95, 0CBB)] ). The sorting of adata field is achieved by the indexing method. Allthese can be elaborated to give the actual algorithmsand flow charts, if need be.
Acknowledgements
Acknowledgements are due from Directorate ofInformation Technology, Govt. of Karnataka, tothe following persons who have taken theresponsibility in arriving at the Unicode standardand prepared this document.
• Mr. C V Srinatha Sastry, Assistant Director,National Aerospace Laboratories, Bangalore 560017, General Secretary, Kannada GanakaParishath, Bangalore 560 019 and Member,Technical Advisory Committee onSatndardisation and Usage of Kannada onComputers, Government of Karnataka.
• Dr. U B Pavanaja, CEO, Visva Kannada Softec,Bangalore and Member, Technical AdvisoryCommittee on Standardisation and Usage ofKannada on Computers, Government ofKarnataka and Member, Kannada GanakaParishath, Bangalore.
• Mr. G N Narasimha Murthy, Secretary, KannadaGanaka Parishath, Bangalore and Manager, StateBank of India.
• Prof. G Venkatasubbiah, Former President,Kannada Sahithya Parishath and FormerProfessor, Vijaya College, Bangalore.
• Prof M H Krishnaiah, Former Professor,Bangalore University
• Prof. Narahalli Balasubrahmanya, Professor,Bangalore University.
October 2002 Contents 37
8.1.3 Typical Colloquial Sentencesin Kannada
GREETING
w Hellow®î®±�¯Ê�®Namaskara¦d«d±I¶dT
w Good Morningý®±�® î®±±ºb¯wµ, ý®±�µ²°u®�®±Shubha munjaane¯dgªd «dga¡dd¦dy
w Good Afternooný®±�® î®±u¯ã�®ÝShubha Madhyanha¯dgªd «dØd¦Uµ
w Good Nightý®±�® �¯räShubha Raathri¯dgªd TdeÎd
w Good Byeý®±�® �¯�µ¶Nµ�®± ¯q®±�S®±l¬ �µ¶Shubha Haaraikeya maathu-Good bye¯dgªd UµdTzIy¶Sd «dd£dg-�djNµ ©dz
w Thanksu®w®ãî¯u®S®¡®±Dhanyavaadhagalu¥d¦Sd®ddQ�dVgµ
w How are youx°î®¼ �µ°TvÛ°�/�µ°TvÛ°�®±neevu Hegiddeeri/Hegidhdheeya¦df®dg Uîµe�dÔfeT/Uîµe�dÔfSd
w I am fine thank youw¯w®± Xµw¯ÝTuµÛ°wµ. u®w®ãî¯u®S®¡®±Naanuchennagiddene.Dhanyavaadhagalu¦dd¦dg �dyêdde�dÔî¦dy. ¥d¦Sd®ddQ�dVgµ
w Sorryy®ý¯Ïq¯Ùy®y®l®±/£®ï±�pashchyaathaapapadu/kshamisi
R¶ÜSdd£dd§d §dNgµGREETING
w It is coldX®¢ Cuµchali idhe
�deVµ BQy
Column 1 Column 2 Column 3 Column 4 Column 5
0C82 0CCD 0C96 0CA6 0CB9
º ¬ Q u® �®0C83 OCBB 0C97 0CA7 0CB3
» À S® u® ¡®0C85 0CBE 0C98 0CA8 0CB4A ¯ U® w® ¦
0C86 0CBF 0C99 0CAA 0CBC
B ¸ W y® . .0C87 0CC0 0C9A 0CAB 0C88
C µ° X® y® D0CC1 0C9B 0CAC 0C89 0CC2
± Z® � E ²0C9C 0CAD 0C8A 0CC3 0C9D
c �® F ³ �®Ä±0CAE 0C8B 0CC4 0C9E 0CAFî®± G± ´ f �®±
0CE0 0CC6 0C9F 0CB0 0C8EG² µ h �® H
0CC7 0CA0 0CB1 0C8F 0CC8 µ° j® ¥ I µ¶
0CA1 0CB2 0C90 0CCA 0CA2l® © J µ² l®
0CB5 0C92 0CCB 0CA3 0CB6î® K µ²° o ý®
0C93 0CCC 0CA4 0CB7 0C94L ¹ q® Ç® M
0C95 0CA5 0CB8N® s® �®
Collating sequence of Kannada UnicodeCharacters.
(Courtesy : Shri C. V. Srinath SastryDirectorate of Information Technology
Government of Karnataka,Bangalore (Karnataka)
Phone: 080-5279611(O), 6645865(R)E-mail : [email protected])
Contents October 200238
w I like Bengali sweets
w¯w®± �ºS¯ª ��rºm�®±w®±Ý CÇ®Ôy®l®±qµÙ°wµ.Naanu Bangaali Thindiyannu ishtapaduthene¦dd¦dg ©da�dde¬d e±deUµ e£daeNµSdêdg BݧdNgµÏdî¦dy
w I love birdsw¯w®± y®¤S®¡®w®±Ý CÇ®Ôy®l®±qµÙ°wµNaanu Pakshigalannu Ishtapaduthene¦dd¦dg §de´d�dVµêdg BݧdNgµÏdî¦dy
w Where is Railway station?�µ¶�µæ x�¯Ûo Hªåuµ?(ET�ºm x�¯Ûo Hªåuµ?)Railve NiladhaanaEllidhe?(ugibandi Niladaanaellidhe?)Tz¬®dy e¦d¬QdPd He¬¬dQy? (De�d©daeNµ e¦d¬dQdPd He¬¬dQy?)
w How far is the Bus Terminal from here?��éw® Nµ²wµ�®± x�¯Ûo Cªåºu® HÇ®±Ô u®²�®ïuµ?Bassina koneya niladhaana illimdha Eshtudhooravidhe?©de±±d¦d I¶dy¦dySd e¦d¬QdPd Be¬¬daQ HÝi QjTe®dQy?
w How long will it take to reach the Airport?ï¯w® x�¯Ûoî®w®±Ý q®©±y®©± HÇ®±Ô �®î®±�®± �µ°N¯S®±q®Ùuµ?Vimaana Niladhaanavannu thalupalu Eshtusamaya Bekaaguthadhe?e®d«dd¦d e¦d¬QdPd®dêdg £d¬dg§d¬dg HÝi ±d«dSd ©dîI¶d�dgÏdQy?
w Is Mr. Raghunath there?Aªå §° �®U®±w¯s¬ Cu¯Û�µ�µ±°?Alli Shree Raghunaath idhaareye?Ae¬¬d Údf T�dg¦dd¤d BÔdTySdy
w Please tell him to call back as soon as he is freeAî®xSµ/Aî®�Sµ �l®±î¯u® q®£®o N®�µ ¯l®©± u®�®±ïh±Ô�µ°¡®±/�µ°¡®±Avanige/Avarige biduvaadha thakshana karemaadalu Hlu/HeliA®de¦d�dy/A®deT�dy e©dNgµ®ddQ £d´dPd I¶Ty «ddNµ¬dg Qsde®dÅhµ UîµVgµ/UîµeVµ
w How much will it cost?Cu®�® �µ�µ HÇ®±Ô?Idhara bele Eshtu?BQT ©dy¬dy HÝi?
w Excuse mew®w®Ýw®±Ý £®ï±�Nannannu Kshamisi
¦dêdêdg ´de«de±d
w It is cool outside
�µ²�®S®lµ q®ºy¯Tuµ.Horagade ThampaagidheUµdyT�dNyµ £da§dde�dQy
w It is hot��©±/�µ°�®Sµ/u®Sµ CuµBisilu/Besige/Thampue©de±d¬dg/ ©dîe±d�dy/¥d�dy
w It is rainingî®±¡µ ��®±rÙuµ.Male Baruthidhe«dVyµ©dèeÏdQy
GENERALw What is your name?
xw®Ý/xî®±â �µ�®�µ°w®±?Ninna/Nimma Hesarenu?e¦dêd/e¦d««d Uyµ±dTî¦dg?
w My name is Ranjanw®w®Ý �µ�®�®± �®ºcw¬Nanna Hesaru Ranjan¦dêd Uyµ±dè Ta¡d¦d
w Where do you live?x°w®±/ x°î®¼ Hªå î¯�®î¯TvÛ°�®±/î¯�®î¯TvÛ°�?Neenu/Neevu elli Vaasavaagiddeeya/Ri?¦df¦dg/¦df®dg He¬¬d ®dd±d®dde�dÔfSd/eT
w I live near Ghantagharw¯w®± U®ºhU®�¬w® �¢ î¯�®î¯TuµÛ°wµNaanu Gantagarna Bali Vaasa Vaagiddene¦dd¦dg OµdaLµ�d¦d� ©deVµ ®dd±d®dde�dÔî¦dy
w How old are you?xî®±â î®�®±�®±é HÇ®±Ô?Nimma Vayassu Eshtu?e¦d««d ®dSd±±dg Hi?
w That building is tallB N®hÔl® Hq®Ù�®î¯TuµAa Kattada etharavaagidheAd I¶ÅµNµ HÏLµT®dde�dQy.
w She is beautifulA®± �®±ºu®�®î¯Tu¯Û¡µAvalu Sundaravaadiddaale
A®dVgµ ±dgaQT®dde�dÔdVyµ
October 2002 Contents 39
w From which Platform can I get the train for
Chandigarh?
�®¾¯î® y¯åg¬ y¯ß�®ºxºu® w®w®Sµ X®ºmU®�¬Sµ �µ²°S®©±�µ¶©± �S®±q®Ùuµ?Yava plat formnimdha nanage Changarge
Hogalu railu siguthadheye?
Sdd®d §¬ddLµ R¶d«d� e¦daQ ¦d¦d�dy �daeNµ�dT �dy Uµdî�d¬dg Tz¬dg e±d�dgÏdQy?w Does this train stop at Aligarh?
D �µ¶©± AªU®�®u®ªå x©±åq®Ùuµ�µ±°?Ee railu Aligharadhalli nilluthadheye?
C Tz¬dg Ae¬d�dTQe¬¬d e¦d¬¬dgÏdQySdy?w How many kids do you have?
xw®Sµ/xî®±Sµ HÇ®±Ô î®±ºv î®±N®Ê¡®± Cu¯Û�µ?Nimage/Ninage Eshtu Mandhi makkalu
iddare?
e¦d¦d�dy/e¦d«d�dy HÝi «daeQ «d�I¶Vgµ BÔdTy?w This gift is wonderful
D El®±Sµ²�µ Au®±áq®î¯Tuµ.E udugore adhbuthavaagidhe.
C DNgµ�ddyTy AÖ£dµ®dde�dQyw It is really pretty
Au®± xcî¯S®©² �®±ºu®�®î¯TuµAdhu Nijavaagalu Sundharavaagidhe.
AQg e¦d¡d®dd�d¬dg ±dgaQT®dde�dQy.w Food is delicious
Fh/rºm �®±YN®�®î¯TuµUta/Thindi Ruchikaravaagidhe
ELµ e£daeNµ èe�dI¶T ®dde�dQyw Congratulations
A�Ãw®ºu®wµS®¡®±Abhinandhanegalu
Aeªd¦daQ¦dy�dVgµw You look lovely
x°w®± î®±±u¯ÛT N¯p�®±rÙvÛ°�®±/ x°î®¼ î®±±u¯ÛTN¯p�®±rÙvÛ°�®Neenu Mudhaagi Kaanisuthiddeeya/
Kaanisuthiddeera
¦df¦dg «dgÔde�d I¶dePd±dgeÏdÔfSd/¦df®dg «dgÔde�d I¶dePd±dgeÏdÔfT
w Wish you happy new year
xî®±Sµ/xw®Sµ �µ²�®î®Ç®Áu® ý®±�¯ý®�®±S®¡®±Nimage/Ninage Hosa varshadha
Shubhaashayagalu
e¦d«d�dy/e¦d¦d�dy Uµdy±d ®d¯d�Q ¯dgªdd°dSd�dVgµw I wish you all the happiness
w¯w®± xî®±Sµ y®½oÁ�®ºqµ²°Ç®î®w®±Ý ��®±�®±qµÙ°wµNaanu Nimage Poornasanthoshavannubayasuthene.¦dd¦dg e¦d«d�dy §djPd� ±da£ddî°d®dêdg ©dSd±dgÏdî¦dy
w Congratulations on your marriagexî®±â/xw®Ý î®±u®±îµSµ A�Ãw®ºu®wµS®¡®±Nimma/Ninna Madhuvege Abhinandhanegalu
e¦d««d/e¦dêd «d¥dg®dy�dy Aeªd¦daQ¦dy�dVgµw Keep your eyes wide open before marriage and
half- shut afterwardsxw®Ý N®o±ØS®¡®w®±Ý î®±u®±îµSµ î®±±w®Ý y®½rÁ�®¾¯T�®±² Aw®ºq®�®Au®Á qµ�µvh±ÔNµ² xî®±â N®o±ØS®¡®w®±Ý î®±u®±îµSµ î®±±w®Ýy®½rÁ�®¾¯T�®±² Aw®ºq®�® Au®Á qµ�µvh±ÔNµ²¢ëNimma Kannugalannu Madhuvege Munnapoorthiyaagiyoo Ananthara Ardhavootheredhittukollie¦dêd I¶PPdg�dVµêdg «d¥dg®dy�dy «dgêd §dje£d�Sdde�dSdj A¦da£dT A¥d�®dj£dy�dyeQÅhµI¶dy e¦d««d I¶PPdg�dVµêdg «d¥dg®dy�dy «dgêd §dje£d�Sdde�dSdj A¦da£dTA¥d�®dj £dy�dyeQÅhµI¶dye³Vµ
(Courtesy : Shri C. V. Srinath SastryDirectorate of Information Technology
Government of Karnataka,Bangalore (Karnataka)
Phone: 080-5279611(O), 6645865(R)E-mail : [email protected])
Contents October 200240