Proposal for a Devanagari Script Root Zone Label ... · Dhundhari, Harauti and Wagdi. Closely...

ProposalforaDevanagariScriptRootZoneLabelGenerationRule-Set(LGR)LGRVersion:3.0

Date: 2018-07-27

Documentversion:6.1

Authors:Neo-BrahmiGenerationPanel[NBGP]

1 GeneralInformation/Overview/Abstract

Thisdocument laysdown theLabelGenerationRuleSet for theDevanagari script.Three

main components of the Devanagari Script LGR i.e. Code point repertoire, Variants andWholeLabelEvaluationRuleshavebeendescribedindetailhere.

All these components have been incorporated in a machine-readable format in the

accompanyingXMLfilenamed"Proposal-lgr-devanagari-20180727.xml".

Inaddition,adocumentnamed“Devanagari-test-labels-20180727.txt”hasbeenprovided.

ItcontainsalistofvalidandinvalidlabelsaspertheWholeLabelEvaluationlaiddownin

Section 7 of this document. The labels have been tagged as valid and invalid under thespecificrules1.Inaddition,thefilealsoliststhesetoflabelswhichcanproducevariantsaslaiddowninSection6ofthisdocument.

2 ScriptforwhichtheLGRisproposed

ISO15924Code:Deva

ISO15924KeyN°:315

ISO15924EnglishName:Devanagari(Nagari)

Latintransliterationofnativescriptname:dévanâgarî

1 The categorization of invalid labels under specific rules is given as per the general understanding of the LGR Tool by the NBGP. During testing with any LGR tool, whether a particular label gets flagged under the same rule or the different one is totally dependent on the internal implementation of the LGR Tool. In case of discrepancy among the same, the fact that it is an invalid label should only be considered.

Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel

2

Nativenameofthescript:देवनागर(

MaximalStartingRepertoire[MSR]version:3

3 BackgroundonScriptandPrincipalLanguagesUsingIt

ThescriptcalledNagariorDevanagari iswritten fromleft toright.Historically itderives

fromtheBrahmialphabetoftheAshokaninscriptions.Devanagariiscurrentlyusedfor11out of 22 scheduled languages of India (Boro/Bodo, Dogri, Hindi, Kashmiri, Konkani,

Maithili, Marathi, Nepali, Sanskrit, Santali and Sindhi) and around 45 other languagesespecially the related Indo-Aryan languages: Bagheli, Bhili, Bhojpuri, Himachali dialects,Magahi, Newar and Rajasthani and its dialects: Marwari, Mewati, Shekhawati, Bagri,

Dhundhari, Harauti and Wagdi. Closely associated with Sanskrit and Prakrit, it is analternative script for Kashmiri (by Hindu speakers), Sindhi and Santali. It is growingpopular in use by speakers of tribal languages of Arunachal Pradesh, Bihar, Chattisgarh,

Jharkhand,MadhyaPradeshandAndaman&NicobarIslands.ThescriptisalsousedinFijitorepresentFijiHindi.Hindi isalsoa languageofcommunication inMauritius,Malaysia,England, Canada, South Africa, Indonesia as well as emigrant communities around the

world.ThescriptisalsousedinNepalforwritingtheNepalilanguage.NepaliistheofficiallanguageofNepal aswell asone languageof the stateofSikkim in India. It isspokenbyover30millionpeople.

Devanagari is used by over 120 languages in India, Bangladesh, Nepal and in Southeast

Asia.

3.1 TheEvolutionoftheScript

It is well known that Devanagari has evolved from the parent script Brahmi, with its

earliesthistoricalformknownasAśokanBrahmi,tracedtothe4thcenturyBC.Brahmiwasdeciphered by Sir James Prinsep in 1837. The study of Brahmi and its development has

shownthatithasgivenrisetomostofthescriptsinIndiaaswellasinothercountriesviz.SriLanka,Myanmar,Cambodia,Thailand,Laos,andtheregionofTibettonameafew.


3

The evolution of Brahmi into present-day Devanagari involved intermediate forms,

commontootherscriptssuchasGupta,anditstwogenerates–SiddaṃandŚāradāinthe

northandGranthaandKadamba intheSouth.DevanagaricanbesaidtohavedevelopedfromtheKutilascript,adescendantoftheGuptascript,inturnadescendentofBrahmi.Theword "kutila", meaning ‘crooked’, was used as a descriptive term to characterize the

curvingshapesof thescript,comparedtothestraight linesofBrahmi.This inheritance isthereasonwhysomeofthecharactersacrossthescriptsthatwillbeconsideredundertheNeo-BrahmiGPlooksimilartoeachotherdespitebelongingtototallydifferentcodeblocks

oftheUnicodeStandard.

AlookatthedevelopmentofDevanagarifromBrahmigivesaninsightintohowtheIndic

scripts have come to be diversified: the handiwork of engravers and writers who used

differenttypesofstrokesledtodifferentregionalstyles.Thedevelopmentofthescriptisoutlined below. Figure 1: Pictorial depiction of evolution of Devanagari illustrates thestagesintheevolutionofthescript2.

Period Description

300BCE Mauryan: Early Brahmi form in the Asokan edicts. Some scholars believe thatBrahmiitselfevolvedfrom"Kharoshthi"ascriptwrittenrighttoleft.

200CE Kushan/SatavahanaDynasties.

400CE GuptaDynasty

600CE Yasodharman

800CE Origins of the present day Nagari Script. Vardhana dynasty in the North andPallavaperiodintheSouth.

900CE TheperiodoftheChalukyasandRashtrakutas

1100CE ContinuationoftheChalukyaRule

1300CE YadavasinthenorthandKakatiyasinthesouth.

1500CE TheVijayanagarempire.

Table 1: Evolution of Devanagari

2http://www.acharya.gen.in:8080/sanskrit/script_dev.php


4

Figure 1: Pictorial depiction of evolution of Devanagari

3.2 Languagesconsidered

Devanagariisusedbyover120languageswhichmakesitoneofthemostusedscriptsin

the world. Languages using Devanagari as their primary script belong to varying geo-politicalscenariosasgivenbelow:

- designatedasofficial(scheduled)languagesofsomecountries

- usedbycommunitieslivinginurbanareas

- usedbycommunitieslivinginruralyetaccessibleareas

- usedbycommunitieslivinginfar-flungareaswhicharenoteasilyconnectedeitherbyroadsorbycommunicationmechanisms.

Information about official (scheduled) languages of countries is easily available.

Information about languages used by communities living in urban areas is also easilyobtainable. There was some effort needed to cover the languages which are spoken bycommunitieslivinginruralyetaccessibleareas.However,itwasquitedifficulttocoverthe

restofthelanguagesbeingspokenbythecommunitieslivinginremotetribalareas,whichare generally not connected by road or by communicationmeans. Defining the scope oflanguagecoveragewashenceessentialtolimitthescopeoftheworktobeundertakenfor

theanalysisoftheDevanagariLGR.


5

NBGPdecided toemploy “ExpandedGraded IntergenerationalDisruptionScale” [EGIDS],

which is designed to measure the status of the languages of the world in terms of

endangermentordevelopment.TheEGIDSconsistsof13 levelswitheachhighernumberonthescalerepresentingagreaterlevelofdisruptiontotheintergenerationaltransmissionofthelanguage.NBGPdecidedtoaccommodateallthelanguagesbelongingtoEGIDSScale

1to4foritsanalysiswhichrepresentslanguagesinoneformortheotherarestillinusage.Followingarethedescriptions3ofthosescales.

Scale Label Description

1 National Thelanguageiswidelyusedbetweennationsintrade,knowledge

exchange,andinternationalpolicy.

2 Provincial The language is used in education, work, mass media, and

governmentatthenationallevel.

3 Wider

Communication

The language is used in education, work, mass media, and

governmentwithinmajoradministrativesubdivisionsofanation.

4 Educational The language is in vigorous use, with standardization and

literature being sustained through a widespread system of

institutionallysupportededucation.

LanguagesbelongingtoLevel5andhigherarenotinwidespreadusage.

Below is the tabular representation of the languages that have been considered for the

DevanagariLGR.

EGIDSScale1 EGIDSScale2 EGIDSScale3 EGIDSScale4

Hindi

Nepali

Konkani

Maithili

Marathi

Sindhi

Bhatri

Halbi

Kinnauri

Kukna

Bhojpuri

Chhattisgarhi

Dogri

Kashmiri

3https://www.ethnologue.com/about/language-status


6

Panchpargania

Sadri

Wagdi

Limbu

Magahi

Sanskrit

Santali

Tamang,Eastern

Avadhi

Newar

Saraiki4

Table 2: Languages considered under Devanagari LGR

DespitebeingclassifiedunderEGIDSScale5, theBorolanguage isalsoconsideredunder

theDevanagariLGRasitisoneofthescheduledlanguagesofIndiaandiswidelyspoken.

Apartfromtheabove-mentionedlanguages,Braj,Dhundari,Mundari,andKhariahavealso

been considered for the analysis as the community using themwas accessible and they

providedtheirinputs.3.2.1 CaseofSanskrit

Sanskritisgenerallyperceivedasanarchaiclanguageusedonlyinancientreligioustexts.

However, it is worth noting that there is a quite vibrant and active user community ofSanskrit in Indiawhich practices Sanskrit on day to day basis. Sanskrit is still taught in

schools under various State and Central educational boards. There is increasing use ofSanskrit on socialmedia aswell. The same is reflected in EGIDS scalewhere Sanskrit iscategorizedinScale4indicatingstatusofthelanguageas“Educational”.

3.3 ThestructureofwrittenDevanagari

Devanagariisanalphasyllabaryandtheheartofthewritingsystemistheakshar.Itisthis

unit,which is instinctivelyrecognizedbyusersof thescript.Tounderstandthenotionofakshar, abriefoverviewof thewritingsystem isprovided in thissectionandtheakshar

itselfwillbetreatedindepthinSection5.4.

4 Though listed in EGIDS scale 4, Saraiki is not covered by the NBGP. As per Ethnologue, the Devanagari script is "no longer in use" by the Saraiki community. Ref: https://www.ethnologue.com/language/skr


7

ThewritingsystemofDevanagaricouldbesummedupascomposedofthefollowing:

3.3.1 TheConsonants

Devanagari consonants have an implicit schwa5 /ə/ vowel included in them. As per

traditional classification they are categorized according to their phonetic properties(especially in terms of place plus manner of articulation). There are 5 Varga groups

(classes)andonenon-Vargagroup.EachVarga,whichcorrespondstoStops,containsfiveconsonantsclassifiedaspertheirproperties.Thefirstfourconsonantsareclassifiedonthebasisofvoicingandaspirationandthelastisthecorrespondingnasal.

Varga Unvoiced Voiced Nasal

-Asp +Asp -Asp +Asp

Velar क U+0915

ख U+0916

ग U+0917

घ U+0918

ङ U+0919

Palatal च U+091A

छ U+091B

ज U+091C

झ U+091D

ञ U+091E

Retroflex ट U+091F

ठ U+0920

ड U+0921

ढ U+0922

ण U+0923

Dental त U+0924

थ U+0925

द U+0926

ध U+0927

न U+0928

Bi-labial प U+092A

फ U+092B

ब U+092C

भ U+092D

म U+092E

Table 3: Varga classification of consonants

Non-Varga

य U+092F

र U+0930

ल U+0932

ळ U+0933

व U+0935

श U+0936

ष U+0937

स U+0938

ह U+0939

Table 4: Non-Varga consonants

5Although representing the implicit vowel as /a/ is more correct orthographically, the schwa /ə/, although not part of the orthographic system has been used since the /a/ would be misunderstood and read as अ/आ/◌ा.


8

3.3.2 TheImplicitVowelKiller:Halant6

Allconsonantscontainan implicitvowel(schwa).Aspecialsign isneededtodenotethat

this implicit vowel is strippedoff.This isknownas theHalant"◌्" (U+094D).TheHalantthus joins two consonants and creates conjuncts, which can be generally from 2 to 4consonantcombinations.Inrarecasesitcanjoinupto5consonants.However,thenotion

ofmaximumnumber of consonants joining to formone akshar is empirical. It is just anobservationdrawnfromthewordsthathavebeenobservedtodate.GiventheconfluenceoflanguageshappeningintheInternetage,thepossibilitythatonemaywantagenericTop

LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledout.Hence,intheLGRwork,thislimitwillnotbeenforced7.

3.3.3 Vowels

Separate symbolsexist forallVowels,whicharepronounced independentlyeitherat the

beginningorafteravowelsound.ToindicateaVowelsoundotherthantheimplicitone,aVowelsign(Matra)isattachedtotheconsonant.Sincetheconsonanthasabuilt-inschwa,

thereareequivalentMatrasforallvowelsexceptingtheअ.

Thecorrelationisshownasfollows:

Vowel

Corresponding

vowelsign

(Matra)

अ U+0905

आ U+0906

◌ा U+093E

इ U+0907

ि◌ U+093F

6 Unicode (cf. Unicode 3.0 and above) prefers the term Virama. In this report both the terms have been used to denote the character that suppresses the inherent vowel. 7This can be the case when a foreign language word, which admits a large number of consonants, is transliterated into Devanāgarī


9

ई U+0908

◌ी U+0940

उ U+0909

◌ु U+0941

ऊ U+090A

◌ू U+0942

ऋ U+090B

◌ृ U+0943

ए U+090F

◌े U+0947

ऐ U+0910

◌ै U+0948

ओ U+0913

◌ो U+094B

औ U+0914

◌ौ U+094C

ॳ U+0973

◌ऺ U+093A

ॴ U+0974

◌ऻ U+093B

ऍ/ॲ U+090D/U+0972

◌ॅ U+0945

ॠ U+0960

◌ॄ U+0944

ऑ U+0911

◌ॉ U+0949


10

ॵ U+0975

◌ॏ U+094F

ॶ U+0976

◌ॖ U+0956

ॷ U+0977

◌ॗ U+0957

Table 5: Vowels with corresponding Matras

Marathiusesॲ(U+0972)insteadofऍ(U+090D).

3.3.4 TheAnusvara(◌ं-U+0902)

The Anusvara represents a homorganic nasal. It replaces a conjunct group of a Nasal

Consonant + Halant + Consonant belonging to that particular varga. Before a non-vargaconsonant the Anusvara represents a nasal sound. Modern Hindi, Marathi and Konkani

languagesprefertheAnusvaratothecorrespondingHalf-nasal8:

सfतvs.सतं/sənt/saint चgपा vs. चंपा /tʃəmpa/ A flower: belonging to the

genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs. U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs. U+091A U+0902 U+092A U+093E

3.3.5 Nasalization:Candrabindu(◌ँ-U+0901)

Candrabindu denotes nasalization of the preceding vowel as in आँख/ãkh/eye (U+0906U+0901 U+0916). Present-day Hindi users tend to replace the Candrabindu by theAnusvara.

8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding “Varga” through a Halant.


11

3.3.6 Nukta(◌़-U+093C)9

TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound

only inwords borrowed fromPerso-Arabic. It is pre-dominantly used in thismanner inBodo, Hindi, Kashmiri, Maithili, Santali, Sindhi and Tamang. It can be adjoined to

"क"(U+0915), "ख"(U+0916), "ग"(U+0917),"ज"(U+091C) and "फ"(U+092B) to show thatwords having these consonantswith a nukta are to be pronounced in the Perso-Arabicstyle,e.g.:

lफ़रोज़/firoz/(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)

Itisalsoplacedunder"ड"(U+0921)and"ढ"(U+0922)toindicateflappedsounds,e.g.:

बढ़ /bədh/(U+092CU+0922U+093C)

WebPublication"DEVANĀGARĪALPHABETANDITSROMANIZATION"[109]bytheCentral

HindiDirectorate,MinistryofHRD,GovernmentofIndia,clearlystatessuchauseofNuktainHindi.

In Bodo the Nukta is adjoined to "ड"(U+0921) [110]. In Maithili it is adjoined to “क” (U+0915),“ज” (U+091C),"ड" (U+0921)and"ढ" (U+0922)[111].InSindhi,itisadjoinedto"ख" (U+0916), "ग" (U+0917), "ज" (U+091C),"फ" (U+092B), "ड" (U+0921) and "ढ" (U+0922)[104].

InKashmiri, it canalsobeadjoined to "च" (U+091A), "छ" (U+091B)and "ज" (U+091C)[108]toindicatethelaterallyreleasedaffricates.

rाय/čāy/tea(U+091AU+093CU+093EU+092F)

sल/čhal/wash-Imperative(U+091BU+093CU+0932)

पॊज़/póz/fact(U+092AU+094AU+091CU+093C)

9The possible sets of consonants/vowels have been derived from various sources viz. Prior research carried out by Centre for Development of Advanced Computing's [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (https://cdac.in/index.aspx?id=mlc_gist_about), Omniglot and inputs provided by various experts on-board the NBGP for specific languages. Only Omniglot references have been provided as they are available online.


12

NormallyaNuktaisappendedtoaConsonant.However,theSantalilanguageusesNuktain

auniqueway.TheNuktaisadjoinedtofollowingvowelsandvowelsigns:

a. आ (U+0906)

b. ओ (U+0913)

c. ◌ा (U+093E)

d. ◌ो (U+094B)

3.3.7 Visarga(◌ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to/h/, for

example:दःुख /duhkh/sorrow,unhappiness(U+0926U+0941U+0903U+0916).

TheAvagraha"ऽ"(U+093D)createsanextrastressontheprecedingvoweland isused inSanskrit texts. It is rarely used in other languages usingDevanagari. In case of LGR, the

AvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire.

3.3.8 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (afterHalant) where default conjunct formation is to be explicitly restricted and the Halant

joining the twoconsonantsparticipating in the conjunct formationneeds tobeexplicitlyshown.Forexample,theconjunct w /ksha/ whichgetsformedbyक/ka/ + ◌्(halant) + ष/sha/getsrenderedasक् ष–whenformedbyक/ka/ + ◌्(halant) + ZeroWidthNon-joiner+ ष /sha/. In certain cases, for certain communities, this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced.

TheZeroWidth Joiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthatconstituting consonant shapes may not be directly visible in the conjunct shape. For

example,theconjunct w /ksha/ whichgetsformedbyक/ka/ + ◌्(halant) + ष/sha/doesnotshowhalfformofkajoiningwithsha.However,usingZWJ,theconstitutingconsonant’s


13

shapesarepreservedinthevisualdepiction:x ष–formedbyक/ka/ + ◌्(halant) + ZeroWidthJoiner+ष/sha/.

Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection5.2)..However,withthe

newrecommendationsinplace,thisusageofZWJisnownotencouraged.

4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel, there are many different scripts belonging to

separateUnicodeblocks.EachofthesescriptshasbeenassignedaseparateLGR;however,

theNeo-BrahmiGPensuredthat the fundamentalphilosophybehindbuildingthoseLGRsare all in syncwith all otherBrahmi derived scripts. This is the Devanagari LGR,whichcaterstomultiplelanguageswrittenusingDevanagari,mostlybelongingtoEGIDSscale1to

4.

4.1 GuidingPrinciples

TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point

repertoireacrosstheboardforallthescriptswithinitsambit.

4.1.1 Inclusionprinciples4.1.1.1 Modernusage

Every character proposed should be in the everyday usage of a particular linguistic

community. The characters which have been encoded in the Unicode for transcription

purposes only or for archival purposeswill not be considered for inclusion in the code-pointrepertoire.

4.1.1.2 Unambiguoususe

Every character proposed shouldhave unambiguousunderstanding among the linguistic

communityaboutitsusageinthelanguage.


14

4.1.2 Exclusionprinciples

ThemainexclusionprincipleisthatofExternalLimitsonScope.Thesecompriseprotocols

orstandardsthatarepre-requisitestotheLabelGenerationRulesets.Allfurtherprinciples

are in fact subsumed under these limitations but have been spelt out separately for thesakeofclarity.

4.1.2.1 ExternalLimitsonScope

The code point repertoire for root zone being a very special case, up the ladder in the

protocolhierarchies, thecanvasofavailablecharacters forselectionasapartof theRootZone code point repertoire is already constrained by various protocol layers beneath it.

Thefollowingthreemainprotocols/standardsactassuccessivefilters:

i. The Unicode Standard

Outofallthecharactersthatareneededbythegivenscript,ifthecharacterinquestionis

notencodedinUnicode,itcannotbeincorporatedinthecodepointrepertoire.Suchcasesarequiterare,giventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortium.

ii. IDNA Protocol

Unicode being the character-encoding standard for providing the maximum possiblerepresentationofagivenscript/language,ithasencodedasfaraspossibleallthepossible

charactersneededbythescript.However,thedomainnamebeingaspecializedcase,itisgoverned by an additional protocol known as IDNA (InternationalizedDomainNames inApplications).TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrom

beingpartofthedomainnames.

For Example,Devanagari LetterQa "क़" (U+0958) is not allowed to be a part of domainname. Itsdecomposedform, i.e.DevanagariLetterKa followedbyDevanagariSignNukta

"क"(U+0915)+"◌़"(U+093C)canbeusedinstead.


15

IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)

and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules. These are required in

certaincaseswhereatypicalvisualshapeofanaksharisdesired.

Indomainnames,duetoabsenceofspace“”ortab“-”,therewillbecaseswhereinability

to use ZWNJ can pose some issueswhere twowords need to be joined togetherwhereprevious word needs to end in an Explicit Halant and the next word begins with a

consonant.Inthatcase,aconjunctwillbeformedbetweenlastconsonantofthefirstwordandthe firstconsonantof thesecondword. Thisvisualdisplaymaynotbedesired. Forexample,iftwowordsदेश्(/deš/nation) andzवदेश(/videš/foreignland)arejuxtaposedtoeachother, the resultantword i.e. “देि{वदेश”10 isnot theappropriatewayof rendering it.Appropriaterenderingofthesamewouldbe“देश ्zवदेश”whichcanbeachievedbyaddingaZWNJinbetweenthetwowords.

AstheZWNJisnotpartoftheMSR,itisnotpermissibletomakesuchcombinations.Ifand

when the ZWNJ is permitted by theMSR, the then NBGPmay consider adding it to the

Devanagarirepertoireifnecessary.

However,theremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere

arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier

used. Somespecific shapes11maynotbeable tobemade,however therewillnotbeany

impactonthephoneticlevel.

iii. Maximal Starting Repertoire

The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDs,whichinturnareanevenmorespecializedcaseofdomain

names,theRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset

10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word, however in that case, one can argue that the pronunciation of the two words i.e. देश ्and देश is different and hence it changes the fundamental word. 11 Case of w and x ष: the first is composed with क+◌्+ष while the latter is with क+◌्+ZWJ+ष. The pronunciation of both the conjuncts is same.


16

ofcharacters.Forexample,theDevanagariSignAvagraha"ऽ"(U+093D),evenifallowedbyIDNAprotocol,isnotpermittedintherootzonerepertoireasperthe[MSR].

Tosumup,therestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe

code-block of the given script/language. This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestricts

thecharactersetassociatedwiththegivenlanguageevenmore.

4.1.2.2 NoPunctuationMarks

TheTLDsbeingidentifiers,punctuationmarkerspresentinBrahmibasedlanguagessuch

asDanda"।"(U+0964)anddoubleDanda"॥"(U+0965)willnotbeincluded.

4.1.2.3 NoSymbolsandAbbreviations

Abbreviations, weights and measures and other such iconic characters like Isshar"৺"(U+09FA),Abbreviationsign"॰"(U+0970),etc.willnotbeincluded.

4.1.2.4 NoRareandObsoleteCharacters

There are characters which have been added to Unicode to accommodate rare forms

especially like DEVANAGARI LETTER VOCALIC RR "ॠ" (U+0960) and DEVANAGARILETTER VOCALIC LL "ॡ" (U+0961) as well as their Matra forms "◌ॄ" (U+0944) and "◌ॣ"(U+0963). All such characters will not be included. This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure.

4.1.2.5 NoStressMarkersofClassicalSanskritandVedic

StressmarkersforclassicalSanskrite.g.DEVANAGARISTRESSSIGNUDATTA"◌॑"(U+0951)andDEVANAGARISTRESSSIGNANUDATTA"◌॒"(U+0952)willnotbeincluded.ThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure.


17

5 RepertoireSection5.1providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich

theDevanagaricodepointrepertoireisbased.Section5.2detailsthecodepointrepertoirethat theNeo-BrahmiGenerationPanel [NBGP]proposestobe included intheDevanagariLGR.5.1 DevanagarisectionofMaximalStartingRepertoire[MSR]Version3

Figure 2:Devanagari Code Page from [MSR]

Colorconvention12:

Allcharactersthatareincludedinthe[MSR]-Yellowbackground

PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground

NotPVALIDinIDNA2008-Whitebackground

12This document needs to be printed in color for this to be read correctly.


18

5.2 CodePointRepertoire

Foreachofthecodepoints,languagereferenceshavebeengiveninthelastcolumntitled

"Reference". For the entire coverage of Devanagari code points, references of Hindi,Marathi, Sanskrit, Sindhi andKashmirihave been given. Though only five representativelanguages have been chosen for referencing, they together cover all the code points

requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection3.2.

Sr.No.

UnicodeCodePoint

Glyph CharacterNameIndicSyllabicCategory

Examplelanguagesusingthecodepoint(Notexhaustive

list)

LanguagewithlowestEGIDSscaleusingthecodepoint

Reference

1. 0901 ◌ँ DEVANAGARISIGNCANDRABINDU Candrabindu

Bodo,Hindi,Kashmiri,Konkani,Maithili,Marathi,Nepali,Santaliand

Sanskrit

1Hindi,Nepali

[0],[101],[102],[103],[105],[108],[110],[111],[112],

[113]

2. 0902 ◌ं DEVANAGARISIGNANUSVARAAnusvara(Bindu)

Mostofthelanguagesgivenin

section3.2

1Hindi,Nepali

[0],[101],[102],[103],[113]

3. 0903 ◌ः DEVANAGARISIGNVISARGA VisargaMostofthe

languagesgiveninsection3.2

1Hindi,Nepali

[0],[101],[102],[103],[113]

4. 0905 अ DEVANAGARILETTERA VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

5. 0906 आ DEVANAGARILETTERAA VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

6. 0907 इ DEVANAGARILETTERI VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

7. 0908 ई DEVANAGARILETTERII VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]


19

8. 0909 उ DEVANAGARILETTERU VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

9. 090A ऊ DEVANAGARILETTERUU VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

10. 090B ऋ DEVANAGARI

LETTERVOCALICR

Vowel Hindi,Marathi,Sanskrit 1Hindi[0],[101],[102],

[103]

11. 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0],[101]

12. 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0],[105],[108]

13. 090F ए DEVANAGARILETTERE VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

14. 0910 ऐ DEVANAGARILETTERAI VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

15. 0911 ऑ DEVANAGARI

LETTERCANDRAO

Vowel Hindi,Konkani,Marathi,Kashmiri 1Hindi[0],[100],[101],[102],[108],

[112]

16. 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0],[105],[108]

17. 0913 ओ DEVANAGARILETTERO VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

18. 0914 औ DEVANAGARILETTERAU VowelMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

19. 0915 क DEVANAGARILETTERKA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]


20

20. 0916 ख DEVANAGARILETTERKHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

21. 0917 ग DEVANAGARILETTERGA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

22. 0918 घ DEVANAGARILETTERGHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

23. 0919 ङ DEVANAGARILETTERNGA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[113]

24. 091A च DEVANAGARILETTERCA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

25. 091B छ DEVANAGARILETTERCHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

26. 091C ज DEVANAGARILETTERJA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

27. 091D झ DEVANAGARILETTERJHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

28. 091E ञ DEVANAGARILETTERNYA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[113]

29. 091F ट DEVANAGARILETTERTTA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]


21

30. 0920 ठ DEVANAGARILETTERTTHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

31. 0921 ड DEVANAGARILETTERDDA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

32. 0922 ढ DEVANAGARILETTERDDHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

33. 0923 ण DEVANAGARILETTERNNA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

34. 0924 त DEVANAGARILETTERTA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

35. 0925 थ DEVANAGARILETTERTHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

36. 0926 द DEVANAGARILETTERDA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

37. 0927 ध DEVANAGARILETTERDHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

38. 0928 न DEVANAGARILETTERNA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

39. 092A प DEVANAGARILETTERPA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]


22

40. 092B फ DEVANAGARILETTERPHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

41. 092C ब DEVANAGARILETTERBA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

42. 092D भ DEVANAGARILETTERBHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

43. 092E म DEVANAGARILETTERMA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

44. 092F य DEVANAGARILETTERYA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

45. 0930 र DEVANAGARILETTERRA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

46. 0932 ल DEVANAGARILETTERLA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

47. 0933 ळ DEVANAGARILETTERLLA ConsonantBodo,Konkani,Marathi,Nepali,

Sanskrit1Nepali

[0],[102],[103],[110],[112],

[113]

48. 0935 व DEVANAGARILETTERVA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

49. 0936 श DEVANAGARILETTERSHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]


23

50. 0937 ष DEVANAGARILETTERSSA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],

[113]

51. 0938 स DEVANAGARILETTERSA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

52. 0939 ह DEVANAGARILETTERHA ConsonantMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[104],[105],[108],

[113]

53. 093A ◌ऺ DEVANAGARI

VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11],[105],[108]

54. 093B ◌ऻ DEVANAGARI

VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11],[105],[108]

55. 093C ◌़ DEVANAGARISIGNNUKTA NuktaBodo,Hindi,

Kashmiri,Maithili,Santali,Sindhi

1Hindi[0],[101],[105],[108],[110],[109],[111]

56. 093E ◌ा DEVANAGARIVOWELSIGNAA MatraMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[113]

57. 093F ि◌ DEVANAGARIVOWELSIGNI MatraMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[113]

58. 0940 ◌ी DEVANAGARIVOWELSIGNII MatraMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[113]

59. 0941 ◌ु DEVANAGARIVOWELSIGNU MatraMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[113]

60. 0942 ◌ू DEVANAGARIVOWELSIGNUU MatraMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[113]

61 0943 ◌ृ DEVANAGARIVOWELSIGNVOCALICR

Matra Hindi,Marathi,Sanskrit 1Hindi[0],[101],[102],

[103]


24

62. 0945 ◌ॅ

DEVANAGARIVOWELSIGNCANDRAE=candra

MatraHindi,Konkani,Marathi,Sanskrit,

Kashmiri1Hindi [0],[100],[101],[108]

63. 0946 ◌ॆ DEVANAGARIVOWELSIGNSHORTE

Matra Kashmiri 4Kashmiri [0],[105],[108]

64. 0947 ◌े DEVANAGARIVOWELSIGNE MatraMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[105],[108],[113]

65. 0948 ◌ै DEVANAGARIVOWELSIGNAI MatraMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[113]

66. 0949 ◌ॉ DEVANAGARIVOWELSIGNCANDRAO

Matra Hindi,Konkani,Marathi,Kashmiri 1Hindi [0],[100],[108]

67. 094A ऒ DEVANAGARILETTERSHORTO Matra Kashmiri 4Kashmiri [0],[105],[108]

68. 094B ◌ो DEVANAGARIVOWELSIGNO MatraMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[105],[108],[113]

69. 094C ◌ौ DEVANAGARIVOWELSIGNAU MatraMostofthe


1Hindi,Nepali

[0],[101],[102],[103],[105],[108],[113]

70. 094D ◌् DEVANAGARISIGNVIRAMAHalant/Virama

Mostofthelanguagesgivenin

section3.2

1Hindi,Nepali

[0],[101],[102],[103],[105],[108],[113]

71. 094F ◌ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0],[105],[108]

72. 0956 ◌ॖ DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11],[105],[108]

73. 0957 ◌ॗ DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11],[105],[108]

74. 0972 ॲ DEVANAGARI

LETTERCANDRAA

Vowel Konkani,Marathi,Kashmiri2Konkani,Marathi

[9],[100],[102],[108],[112]


25

75. 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11],[105],[108]

76. 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11],[105],[108]

77. 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11],[105],[108]

78. 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11],[105],[108]

79. 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11],[105],[108]

80. 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8],[104]

81. 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8],[104]

82. 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8],[104]

83. 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8],[104]

Table 6: Code point repertoire

Apart from the above individual code-points, the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionofthe"DEVANAGARI

LETTERRRA"intherepertoireforenablinginclusionof“EyelashReph”13construct.

Sr.No. UnicodeCodePoints Sequence CharacterNames

Examplelanguagesusingthecode-point

(Notexhaustive

list)

Reference

1.

0931

094D

092F

य

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERYA

Konkani,Marathi,Nepali

[106],[107]

13 Unicode uses the term “Eyelash Ra” instead. Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930), the term “Reph” is used here.


26

2.

0931

094D

0939

ह

DEVANAGARILETTERRRA

DEVANAGARISIGNVIRAMA

DEVANAGARILETTERHA

Konkani,Marathi,Nepali

[106],[107]

Table 7: Sequences

5.3 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire.

Sr.No.

UnicodeCodePoint

Glyph CharacterName Reasonforexclusion

1. U+0904 ऄ DEVANAGARILETTERSHORTAUsageunknown.Notrequiredexplicitlybyanylanguage.

2. U+090C ऌ DEVANAGARILETTERVOCALICLNotinmodernusage.Excludedas

perconservatismprinciple.

3. U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguage.Requiredonlyfor

transcribingDravidianalveolarn.

4. U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguage.RequiredonlyfortranscribingDravidianl.

5. U+0944 ◌ॄ DEVANAGARIVOWELSIGNVOCALICRRNotinmodernusage.Excludedas

perconservatismprinciple.

6. U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguage.RequiredonlyintransliterationofAvestan.

7. U+097A ॺ DEVANAGARILETTERHEAVYYAUsageunknown.Notrequiredexplicitlybyanylanguage.

5.4 StructuralFormationofDevanagari:

AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof

theirwords,knownasakshar.Inthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari

Script.These rulesneed slightadditions fordifferent languageswritten inDevanagari intermsof:


27

-Characteraddition/deletion(e.g.Nukta[U+093C]characterisapplicableforHindi

butnotMarathi)

-Presenceorabsenceofaparticularrule(e.g.EyelashRephconstructisrequiredin

Marathi,KonkaniandNepalibutnotinHindi).

Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe

Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother.

In Section 7, the Whole Label Evaluation (WLE) rules are given which cover all the

languagesunderthepurviewoftheNBGPfortheDevanagariscript.

5.5 AksharformationrulesforHindi

ThissectiondetailstheaksharformationrulesasapplicabletoHindi.Thefirstsectionlists

the categories of the characters in the form of variables. In the rules, instead of theirdescriptive names, the variable names are used. The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherules.Thefollowingtwo

sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants. These rules are based on anIndianStandard(IS13194:1991)popularlyknownas"IndianScriptCodeforInformation

Interchange"[ISCII].5.5.1 Variablesinvolved

Dash →Hyphen-Digit →Indo-Arabicdigits[0-9]

C →ConsonantM →Matra

V →Vowel

B →Anusvara(Bindu)D →Candrabindu

X →Visarga

H →Halant/ViramaN →Nukta


28

5.5.2 Operatorsused:

Symbol Function

| Alternative

[] Optional

* VariableRepetition

() SequenceGroup

Table 8: Symbol functions

Inwhatfollows,theVowelSequenceandtheConsonantSequencepertinenttoDevanagari,

whenusedtowriteHindi,aregiven.

5.5.3 TheVowelSequence

Avowelsequencebeginswithavowel.ItmaybeoptionallyfollowedbyanAnusvara(B),

Candrabindu (D) or a Visarga (X). The number of B, D or X which can follow a V in

Devanagariarerestrictedtoone.

Thepossibilityof aVisarga followingaCandrabinduorAnusvara is ruledout, since it is

usedonlyinVedicandinBengaliscript.

ThevowelsequenceinHindiisthereforeV[B|D|X]

Examples:SequenceDescription Sequence Example Constituting

characters

Vowel Vअ /a/

U+0905

Vowel+Anusvara V[B]अं /aṁ/

U+0905U+0902

अ ◌ं

U+0905U+0902

Vowel+Candrabindu V[D]अँ /aṃ/

U+0905U+0901

अ ◌ँ

U+0905U+0901

Vowel+Visarga V[X] अः /aḥ/ अ ◌ः


29

U+0905U+0903 U+0905U+0903 Table 9

5.5.4 ConsonantSequence

Aconsonantsequencebeginswithaconsonant. Itmaybeoptionally followedbyaNukta

(N),Matra(M),Anusvara(B),Candrabindu(D),Visarga(X)oraHalant(H).Thenumberof

instances of these characters occurring after a consonant is restricted to one. There is apossibilityof further extension of the Consonant sequence after theN,M andH. Each ofthesehasbeendiscussedinthefollowingsections:

1.Asingleconsonant(C)

(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta

signwhereversuchacaseispertinent.)Examples:

SequenceDescription Sequence Example Constitutingcharacters

Consonant Cक /ka/

U+0915

Consonant+Nukta C[N] क़ /ḳa/ क ◌़

U+0915U+093C Table 10

2. A consonantoptionally followed by dependent vowel sign/Matra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]

C[M|B|D|X|H]

Examples:


Consonant+Matra C[M] lक /ki/ क ि◌

U+0915U+093F

Consonant+Anusvara C[B] कं /kaṁ/ क ◌ं

U+0915U+0902


30

Consonant+Candrabindu C[D] कँ /kaṃ/ क ◌ँ

U+0915U+0901

Consonant+Visarga C[X] कः /kaḥ/ क ◌ः

U+0915U+0903

Consonant+Halant C[H] क् /k/(PureConsonant)

क ◌्

U+0915U+094D Table 11

2.A.ACMsequencecanbeoptionallyfollowedbyD,BorX

(CM)[D|B|X]

Example:


Consonant+Matra+Anusvara CM[B] कं /kīṁ/ क ◌ी ◌ं

U+0915U+0940U+0902

Consonant+Matra+Candrabindu CM[D] काँ /kāṃ/ क ◌ा ◌ँ

U+0915U+093EU+0901

Consonant+Matra+Visarga CM[X] कः /kīḥ/ क ◌ी ◌ः

U+0915U+0940U+0903 Table 12

3.Asequenceofconsonants(upto4)joinedbyHalant14*3(CH)C

Example:


Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant

CHCHCHC fय/nkrya/fय

U+0928U+094DU+0915U+094DU+0930U+094D

14 In case of Sanskrit, it can join upto 5 consonants.


31

U+092F Table 13

However, in theWLE rules proposed in Section 7 do not impose any restriction on thenumberofconsonantsthatcanbejoinedbyaHalant.Subsets:

3.A.ThecombinationmaybefollowedbyM,B,DorX

Example:


Consonant+Halant+Consonant+Matra CHC[M] xक /kkī/xक

U+0915U+094DU+0915U+0940

Consonant+Halant+Consonant+Anusvara CHC[B]xकं

/kkaṁ/

xकं

U+0915U+094DU+0915U+0902

Consonant+Halant+Consonant+Candrabindu CHC[D]xकँ

/kkaṃ/

xकँ

U+0915U+094DU+0915U+0901

Consonant+Halant+Consonant+Visarga CHC[X]xकः

/kkaḥ/

xकः

U+0915U+094DU+0915U+0903

Table 14

3.B.*3(CH)CMmaybefollowedbyaB,DorX

Example:


Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] xकं /kkīṁ/

xकं

U+0915U+094DU+0915U+0940U+0902

Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] xक /kkīṃ/

xक

U+0915U+094DU+0915


32

U+0940U+0901

Consonant+Halant+Consonant+Matra+Visarga CHCM[X] xकः /kkīḥ/

xकः

U+0915U+094DU+0915U+0940U+0903

Table 15

ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbased.

AslanguagesotherthanHindiareconsidered,someadditionallanguage-specificcharactersand rules are introduced. There are some additional finer aspects to these rules as onetakesintoaccountthedigits,punctuationsandspecialstandalonecharacterslikeAvagraha.

Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebased,excludesthosecharacters.


33

6 VariantsTherearenocharacters/charactersequencesinDevanagariwhichcanbecreatedbyusing

thecharacterspermittedasperthe[MSR]andthatlookexactlyalike.However,Devanagarihas ample cases of confusingly similar variants. TheNBGP categorizes these confusinglysimilarvariantsintwogroups.

Group1:Confusingduetopurevisualsimilarity

Group2: Confusing due to deviation from normally perceived character

formationsbylargerlinguisticcommunity

AsadvisedbyICANN,nocasesbelongingtoGroup1areproposed,asthereisanotherpanel

(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcases."Table20:Visuallyconfusables"in"AppendixA:Visuallyconfusablecharacters/sequences"liststhem.

CaseswhichbelongtoGroup2,however,areproposedtobeconsideredasvariants.These

cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations. These can cause confusion even to acarefulobserverandhencebeingproposedasvariants.Followingisthebriefdescriptionof

thesevariantsfollowedbyvariantsinTable16andTable17.

6.1 Vowel/VowelsignfollowedbyNukta

The Santali language has a unique requirement for Nukta character "◌़"(U+093C)positioning,which is not common inotherDevanagari based languages. Santali requires

theNuktacharactertofollowcertainVowelsandMatras.CompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluation rules (given intheSection6.1)tobeopenedupforthesespecificcases.Aregularnon-Santaliusermostlycannoteven

anticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse.

Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto

amajorityoftheDevanagariuser-base.Beingauniquecaseofhomographicsimilarity,the

followingvariantsarebeingproposed.


34

Variant1 Variant2

आU+0906

आ़U+0906U+093C

ओU+0913

ओ़U+0913U+093C

◌ाU+093E

◌ा◌़U+093EU+093C

◌ोU+094B

◌ो◌़U+094BU+093C

Table 16: Proposed Variants - Set 1

6.1.1 VariantcontextruleforSantaliNuktavariants:

All of the Nukta variants given in "Table 16: Proposed Variants - Set 1" have a typical

characteristicwhichis,withinavariantpair,Variant1isasubsetoftheVariant2,e.g.in

thefirstpair,आ(U+0906)isasubsetofआ़(U+0906U+093C).Thisimpliesaregenerativetendency, in theory, i.e. if anआ (U+0906) is substituted with आ़ (U+0906 U+093C), itintroducesanewinstanceofआ(U+0906)asseenhereinbold:आ़(U+0906U+093C).Bydefinition,thisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ़(U+0906U+093C) therebycreatingan invalidakshar combination आ़़ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNukta.Topreventthis,avariantcontextrulehas

beenaddedtoalltheabovenuktavariantsasgivenbelow.

Rule:Asperthe"Table 16: Proposed Variants - Set 1"theVariant1toVariant2relationship

existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)

character.Thus,followingvariantrelationsareboundbytheabovecondition:

आ(U+0906) → आ़(U+0906U+093C)

ओ(U+0913) → ओ़(U+0913U+093C)

◌ा(U+093E) → ◌ा◌़(U+093EU+093C)

◌ो(U+094B) → ◌ो◌़(U+094BU+093C)

ThevariantrelationshipfromVariant2toVariant1isnotconstrainedbyanyruleasit

doesnotgiverisetotheinvalidnuktacombination.


35

6.2 UniqueVowelsandVowelSignsrequiredforKashmiri

Kashmiriwhenwritten in Devanagari script requires a unique set of Vowels and Vowel

signswhich only a Kashmiri speaker can understand. Themajority of Devanagari userswhoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowels/

VowelsignswhichlooksimilartotheKashmiriones.TherearealsocaseswhereaKashmiriVowel/Vowelsignscanbeconfusedwithcertainaksharformations.Hence,theyarebeingproposedasvariants.

Variant1 Variant2

ॳ U+0973

अं U+0905U+0902

◌ऺ U+093A

◌ं U+0902

ॴ U+0974

आं U+0906U+0902

◌ऻ U+093B

◌ा◌ं U+093EU+0902

ऎ U+090E

ऐ U+0910

◌ॆ U+0946

◌े U+0947

ॵ U+0975

औ U+0914

◌ॏ U+094F

◌ौ U+094C

Table 17: Proposed Variants - Set 2

6.3 HalantinFinalPosition(Onlyadiscussion,notproposedasvariants)

AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword

ending inHalant "◌्" (U+094D) vis-à-vis the samewordwithout the finalHalant. As thefunctionofHalant isof a vowelkiller, comingat theend,manyusers tend to ignore the

phonetic effect of its presence/absence. The majority of users would pronounce bothwords in the same way, thereby creating a perception of (false) equivalence. However,there also exist some userswho clearly require the final Halant to achieve the peculiar

phonetic effectof a truncated implicit vowel sound in theend.Theseusersmakea clear


36

distinctionbetweenthetwowords(withandwithoutthefinalHalant).Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules for

Devanagari.

In these cases, the presence or absence of finalHalant is clearly visible, and there is no

apparentcasetomakethemvariantpairs.Eventually,inthelightofpracticalexperience,a

futureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs.

6.4 VariantDisposition

Asvariantsmentionedinboth(Table16andTable17)categoriesareconfusinglysimilar,

albeitofapeculiarnature,itisproposedthattheybeconsideredof"blocked"nature.

There is no preference among these variants.Whichever label containing either of these

variantsischosenearlier,theotherequivalentvariantlabelshouldbeblocked.

6.5 Cross-scriptVariants

Across-scriptvariant,alsosometimesreferredtoas"WholeLabelvariant", is thevariant

casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanotherentirelabelinadifferentscript.

Every individualLGRunderNBGPissupposed toprovideasetof cross scriptvariants it

identifieswithallotherscriptsunderNBGP.

NBGP has ensured that not only the individual characters but also most of the akshar

variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theotherscriptsunderNBGP.Thiswasachievedby sharinga listofmost of the Devanagari akshar combinationswith all the other script teams. (Theword

‘most’ is used here as it is not practical to cover all the possible “Consonant + Halant +Consonant + ….” cases. However, for Devanagari, all cases of “Consonant + Halant +Consonant”combinationswereincludedintheanalysis.)

The Devanagari script has a major set of possible cross-script variants only with the

Gurmukhiscript.CaseslistedinTable18areofthevariantsthatareproposedtobecross-


37

script variants between Devanagari and Gurmukhi. Similarly, Table 19 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali.

ItistobenotedthatnoneofthecombinationslistedinTable18andTable19aretermedto

beequivalentsof eachother semanticallyorotherwise.Theyareonlygroupedbasedonpossiblevisualconfusability.

NBGPhasensuredthatDevanagari,BengaliandGurmukhiLGRteamsproposeasameset

ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunications.Thesamesetofcross-scriptvariants(withDevanagari)issupposedtobe

foundintheBengaliandGurmukhiLGRdocuments.

Devanagari Gurmukhi

◌ंU+0902

◌ਂU+0A02

इU+0907

ਙU+0A19

उU+0909

ਤU+0A24

गU+0917

ਗU+0A17

घU+0918

ਬU+0A2C

टU+091F

ਟU+0A1F

ठU+0920

ਠU+0A20


38

ढU+0922

ਫU+0A2B

पU+092A

ਧU+0A27

भ

U+092Dਮ

U+0A2E

मU+092E

ਸU+0A38

वU+0935

ਕU+0A15

हU+0939

ਵU+0A35

◌ऺ U+093A

◌ਂU+0A02

◌़U+093C

◌਼U+0A3C

ि◌U+093F

ਿ◌U+0A3F

◌ीU+0940

◌ੀU+0A40

◌ॅU+0945

◌ੱU+0A71


39

◌ॆU+0946

◌ੇU+0A47

◌ॆU+0946

◌ੋU+0A4B

◌े

U+0947◌ ੇ

U+0A47

◌ेU+0947

◌ ੋU+0A4B

◌ैU+0948

◌ੈU+0A48

◌ॖ

U+0956◌ੁ

U+0A41

◌ॗ

U+0957◌ੂ

U+0A42

ि7टU+092AU+094DU+091FU+093F

ਇU+0A07

7ट8U+092AU+094DU+091FU+0940

ਈU+0A08

7टेU+092AU+094DU+091FU+0947

ਏU+0A0F

9U+0924U+094DU+0924

ਜU+0A1C

Table 18: Proposed Cross-script Devanagari-Gurmukhi Variants


40

Devanagari Bengali

मU+092E

মU+09AE

ि◌U+093F

ি◌U+09BF

Table 19: Proposed Cross-script Devanagari-Bengali Variants

In addition to above cases, the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusables,whichlooksimilarbutnotsimilarenoughtoberecommendedascross-scriptvariants.The"Table21:DevanagariCross-scriptconfusables"in"AppendixB:

Cross-scriptConfusables"liststhem.

7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection

3.2whenwritteninDevanagariScript.TheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification.

Belowarethesymbolsused intheWLErules,foreachof the"IndicSyllabicCategory"as

mentionedintheTable6:Codepointrepertoire.

C → Consonant

M → Matra

V → Vowel

B → Anusvara(Bindu)

D → Candrabindu

X → Visarga

H → Halant/Virama

N → Nukta

S → EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)


41

His094D(◌् - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)

BelowarethespecificWLErules:

1. N:mustbeprecededonlybyamemberofC1,V1orM1.

ThesetC1consistsoftheseconsonants:

a. क (U+0915)

b. ख (U+0916)

c. ग (U+0917)

d. च (U+091A)

e. छ (U+091B)

f. ज (U+091C)

g. ड (U+0921)

h. ढ (U+0922)

i. फ (U+092B)

ThesetV1consistsofthesevowels:

a. आ (U+0906)(RequiredinSantalilanguage)

b. ओ (U+0913)(RequiredinSantalilanguage)

ThesetM1consistsofthesematras:

a. ◌ा (U+093E)(RequiredinSantalilanguage)

b. ◌ो (U+094B)(RequiredinSantalilanguage)

2. H:mustbeprecededbyCorCN15

15 where CN is a C followed by an N


42

3. M:mustbeprecededbyCorCN16

4. X:mustbeprecededbyeitherofV,C,NorM

5. B:mustbeprecededbyeitherofV,C,NorM

6. D:mustbeprecededbyeitherofV,C,NorM

7. V:CanNOTbeprecededbyH(detailsin"CaseofVprecededbyH")

CaseofEyelashReph

IntheWLErules,thereisnospecificmentionoftheEyelashRephfortworeasons:1. AstheU+0931isaddedasapartofpermissiblesequencesinTable7:Sequences,it

getspermittedonlywiththespecificsequences.

2. The last characters of both the sequences of which the U+0931 is part, are

consonants. As the Eyelash-Reph can take all the combinations as that of a

consonant,nospecifichandlingintermsofcontextruleisrequired.

CaseofVprecededbyH

Asanyvalidakshar inDevanagaribeginseitherwithaConsonantoraVowel, in caseofmulti-words domains, it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character. It is to be noted that only the case “VprecededbyH”needsaspecialdiscussionasgivenbelow.

There couldbe cases involvingmulti-worddomainswhereVmayneed to be allowed tofollow an H, e.g.आमअ्चार /aːməchaːr/Mango pickle (U+0906 U+092E U+094D U+0905U+091AU+093EU+0930).

ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanH

andthesecondwordbeginswithaV.SomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintended.However,byandlarge,theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthe

soundintendedforthefirstword.

Thisisauniquesituationnecessitatedbythelackofhyphen,spaceortheZeroWidthNon-

joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire.Otherwise,

16 where CN is a C followed by an N


43

Visneverrequiredtobeallowedto followanH.Permittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity,

hencethisisexplicitlyprohibitedbytheNBGP.

If required in future, depending on the prevailing requirements by the community, the

NBGPmayconsiderrevisitingthisrule.


44

8 ContributorsNBGPCo-chairs:Dr.UdayaNarayanSingh,Mr.MaheshDKulkarniandDr.AjayData

FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise.

Position Name Organization Country Language

ExpertiseCo-Chair AjayData DataXgenTechnologies India Hindi,EnglishCo-Chair MaheshD.Kulkarni C-DAC India Marathi,HindiCo-Chair UdayaNarayana

SinghVisva-Bharati,Santiniketan,WestBengal

India Bengali,Maithili,Hindi,English

Member AbhijitDutta Wikimedia India Bengali,HindiMember AkshatS.Joshi

(Editor)C-DAC India Hindi,Marathi

Member AnivarA.Aravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India Hindi,BengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR

PANDARegionalInstituteofEducation(NCERT)

India Odia

Member BhimDhojShrestha Consultant Nepal Nepali,NewarMember ChitritaChatterjee InternetandMobileAssociationof

India(IAMAI)India Multiplelanguages

representedbymembersofIAMAI

Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture

India Assamese

Member DevDassManandhar

Consultant Nepal Nepali,Newar

Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversity&

UniversityofNorthBengalIndia Nepali

Member GirishChandraMishra

LanguageTechnologyCentre,RavenshawUniversity

India Odia

Member GurpreetSinghLehal

PunjabiUniversityPatiala India Panjabi

Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneurs'Hub

(NEHUB)Nepal Nepali,Newar

Member JayPaudyal Consultant India HindiMember JijoPappachan DN.Domains India Malayalam


45

Member K.C.Tikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo

KaleFormerlyaffiliatedwithUniversityofPune

India Marathi

Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember N.DeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India English,Hindi,

Marathi,GujaratiMember RajibChakraborty SocietyforNaturalLanguage

TechnologyResearchIndia Bangla(Bengali)

Member RajivKumar NIXI India Member S.Maniam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam,

Sourashtra,TamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar

SinghNationalTranslationMission,Mysore

India Maithili

Member ShanmugamR C-DAC India TamilMember ShantaramS.Warde

WalawalikarIndependentResearcher India Konkani

Member ShashiPathania P.G.D.ofDogri,UniversityofJammu

India Dogri

Member ShubhamSaran NIXI India Member Sinnathambi

ShanmugarajahUniversityofColomboSchoolofComputing

SriLanka Tamil

Member SujithKartha Digitalkz.com India MalayalamMember SurajAdhikari MercantileCommunications(and

.npccTLD)Nepal Nepali

Member SwarnaPrabhaChainary

GuwahatiUniversity India Bodo

Member U.B.Pavanaja http://vishvakannada.com/ India KannadaMember UmaMaheshwarG CALTS,Univ.ofHyderabad India TeluguMember UttamShrestha

RanaNPNOG Nepal Nepali

Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultant;https://मेरा.भारत India Hindi


46

In addition, following members externally gave inputs to NBGP for the respectivelanguages/scripts.

Name Language/ScriptExpertiseAjitKumar Awadhi,BrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni Hindi,MarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrK.P.Lekhwani SindhiDr.BirendraKumarSoy MundariLanguageDr.DineshKumarShrivastav MagahiLanguageDr.HarvinderKaur GurmukhiScriptDr.LaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage

9 References

[MSR]IntegrationPanel,"MaximalStartingRepertoire—MSR-3OverviewandRationale",28March2018https://www.icann.org/en/system/files/files/msr-3-overview-28mar18-en.pdf

[EGIDS]ExpandedGradedIntergenerationalDisruptionScale,https://www.ethnologue.com/about/language-status(Accessedon13thNov.2017)

[NBGP]Neo-BrahmiGenerationPanel

[gTLD]genericTopLevelDomain


47

[ISCII]IndianScriptCodeforInformationInterchange,https://cdac.in/index.aspx?id=mlc_gist_iscii(Accessedon2ndFeb.2018)

[GIST]GraphicsIntelligencebasedScriptTechnologies,https://cdac.in/index.aspx?id=gist(Accessedon2ndFeb.2018)

[C-DAC]CentreforDevelopmentofAdvancedComputing,https://cdac.in(Accessedon2ndFeb.2018)

[0]TheUnicodeStandard1.1,http://www.unicode.org/versions/Unicode1.1.0/(Accessedon12thDec.2017)




[100]DevanāgarīVIPTeam.“VariantIssuesReport”,ICANN,3rdOct.2011,https://archive.icann.org/en/topics/new-gtlds/devanagari-vip-issues-report-03oct11-en.pdf(Accessedon10thOct.2017)

[101]Omniglot,"Hindi",https://www.omniglot.com/writing/hindi.htm(Accessedon10thOct.2017)

[102]Omniglot,"Marathi",https://www.omniglot.com/writing/marathi.htm(Accessedon10thOct.2017)

[103]Omniglot,"Sanskrit",https://www.omniglot.com/writing/sanskrit.htm(Accessedon10thOct.2017)

[104]Omniglot,"Sindhi",https://www.omniglot.com/writing/sindhi.htm(Accessedon10thOct.2017)

[105]Omniglot,"Kashmiri",https://www.omniglot.com/writing/kashmiri.htm(Accessedon10thOct.2017)


48

[106]Unicode10.0.0,"SouthandCentralAsia-I-OfficialScriptsofIndia”,Page456(R5andR5a)",http://www.unicode.org/versions/Unicode10.0.0/ch12.pdf(Accessedon13thNov.2017)

[107]UnicodeIndicGroup,"DevanagariEyelashRa",http://unicode.org/~emuller/iwg/p8/utcdoc.html(Accessedon13thNov.2017)

[108]M.K.Raina,"HowtoreadandwriteKashmiriinDevanagari?",http://www.koshur.org/pdf/Let%20Us%20Learn%20Kashmiri.pdf(Accessedon12thDec.2017)

[109]CentralHindiDirectorate-MinistryofHRD-Govt.ofIndia,"DevanāgarīAlphabetanditsRomanization",http://hindinideshalaya.nic.in/english/hindi_orgin/devnagarithesysmbols.html(Accessedon12thDec.2017

[110]Omniglot,"Bodo",https://www.omniglot.com/writing/bodo.htm(Accessedon12thDec.2017)

[111]Omniglot,"Maithili",https://www.omniglot.com/writing/maithili.htm(Accessedon12thDec.2017)

[112]Omniglot,"Konkani",https://www.omniglot.com/writing/konkani.htm(Accessedon20thMay.2018)

[113]Omniglot,"Nepali",https://www.omniglot.com/writing/nepali.htm(Accessedon20thMay.2018)


49

10 Books,articlesandwebographiesconsulted

Followingisathematicallysortedsetofdocuments,books,articlesandwebographies

consultedinthedraftingofthisreport

10.1 WRITINGSYSTEMS1. Dillinger.D.,TheAlphabet.AKeytotheHistoryofMankind.3rdEditionin2

Volumes.Hutchison.London.1968.

10.2 DEVANĀGARĪ1. Agrawala,V.S.(1966).TheDevanāgarīscript.In:IndianSystemsofWriting.(Pp.12-

16)Delhi:PublicationsDivision.

2. Agyeya,SacchindanandHiranandVatsyayan.1972.Bhavanti.Delhi:RajpalandSons.

3. Beames,John.1872-79.AComparativeGrammaroftheModernAryanLanguagesof

India.3vols.London,TrubnerandCo.[ReprintedbyMunshiramManoharlal,New

Delhi,1966.]

4. Bhatia,TejK.1987.AHistoryoftheHindiGrammaticalTradition:Hindi-Hindustani

Grammar,Grammarians,HistoryandProblems.Leiden/NewYork:E.J.Brill.

5. Bright,W.(1996).TheDevanāgarīscript.InP.DanielsandW.Bright(eds),The

World’sWritingSystems.(Pp.384-390).NewYork:OxfordUniversityPress.

6. Cardona,George.1987.Sanskrit.InTheWorld'sMajorLanguages.BernardComrie

(ed.).London:CroomHelm.448-469.

7. Dwivedi,RamAwadh.1966.ACriticalSurveyofHindiLiterature.Delhi:Motilal

Banarsidass.

8. Faruqi,ShamsurRahman.2001.EarlyUrduLiteraryCultureandHistory.Delhi:

OxfordUniversityPress.

9. Guru,KamtaPrasad.1919.HindiVyakaran.Varanasi:NagariPrachariniSabha.

(1962edition).

10. Kachru,Yamuna.1965.ATransformationalTreatmentofHindiVerbalSyntax.

London:UniversityofLondonPh.D.dissertation(Mimeographed).

11. Kachru,Yamuna.1966.AnIntroductiontoHindiSyntax.Urbana:Universityof

Illinois,DepartmentofLinguistics.


50

12. KalyanKaleandAnjaliSoman,1986.LearningMarathi.ShriVishakhaPrakashan,

Pune:

13. McGregor,R.S.(1977).OutlineofHindiGrammar.2nded.Delhi:OxfordUniversity

Press.

14. McGregor,R.S.1972.OutlineofHindiGrammarwithExercises.Delhi:Oxford

UniversityPress.

15. McGregor,R.S.1974.HindiLiteratureoftheNineteenthandEarlyTwentieth

Centuries.Wiesbaden:Harrassowitz.

16. McGregor,R.S.1984.HindiLiteraturefromItsBeginningstotheNineteenth

Century.Wiesbaden:Harrassowitz.

17. Pandey,P.K.(2007).Phonology-orthographyinterfaceinDevanāgarīforHindi.

WrittenLanguageandLiteracy,10(2):139-156.2007.

18. Rai,Amrit.1984.AHouseDivided.TheOriginandDevelopmentofHindi/Hindavi.

Delhi:OxfordUniversityPress.

19. Sharad,Onkar.1969.LohiyakeVicar.Allahabad:LokbharatiPrakashan.

20. Singh,A.K.(2007).ProgressofmodificationofBrāhmīalphabetasrevealedbythe

inscriptionsofsixth-eighthcenturies.InP.G.Patel,P.PandeyandD.Rajgor(eds),

TheIndicScripts:PaleographicandLinguisticPerspectives.(Pp.85-107).New

Delhi:DKPrintworld.

21. Sproat,R.(2000).AComputationalTheoryofWritingSystems.Cambridge

UniversityPress.

22. Tiwari,PanditUdaynarayan.1961.HindiBhashakaUdgamaurVikas[TheOrigin

andDevelopmentoftheHindiLanguage].Prayag:LeaderPress.

23. Verma,M.K.1971.TheStructureoftheNounPhraseinEnglishandHindi.Delhi:

MotilalBanarsidass.

10.3 INDICCOMPUTINGSPECIFIC1. IS10401:8-bitcodeforinformationinterchange.1982

2. IS10315:7-bitcodedcharactersetforinformationinterchange.1985

3. IS12326:7-bitand8-bitcodedcharactersets-Codeextensiontechniques.1987


51

4. ISO15919,Informationanddocumentation-TransliterationofDevanāgarīand

relatedIndicscriptsintoLatincharacters.2001

5. ISO2375:Procedureforregistrationofescapesequences.2003

6. ISO8859:8-bitsingle-bytecodedgraphiccharactersets-Parts1-13.1998-2001

7. IDNPOLICYhttp://meity.gov.in/writereaddata/files/India-IDN-Policy.pdf


52

11 AppendixA:Visuallyconfusablecharacters/sequencesTheTable 20 below shows characters / character sequenceswhichmay appear visually

confusingtosomeoftheusersoftheDevanagariscript.However,theyarenotconsideredconfusingenoughtobecategorizedasvariants.

Confusable1 Confusable2

कU+0915

क़U+0915U+093C

खU+0916

ख़ U+0916U+093C

गU+0917

ग़U+0917U+093C

चU+091A

rU+091AU+093C

छU+091B

sU+091BU+093C

जU+091C

ज़U+091CU+093C

डU+0921

ड़U+0921U+093C

ढU+0922

ढ़U+0922U+093C

फU+092B

फ़U+092BU+093C

Table 20: Visually confusables


53

12 AppendixB:Cross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the

Gurmukhiscript.TheTable21liststhem.

InadditiontoGurmukhi,someinstancesofcross-scriptconfusablearefoundwithBengali,

Gujarati,Telugu,Kannada,MalayalamandSinhala.

None of the combinations listed in Table 21 are considered equivalents of each other,

whether semantically or otherwise. They are only grouped based on possible visualconfusability.

Atfirst,theymaynotlookexactlythesame,however,inthegivencontexte.g.inabrowser

barasapartofadomainname,orasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishing,theycancreatevisualconfusion.

A label canbe considered tohavea cross-scriptvariant labelonly if "all" the constituent

characters/aksharashaveanequivalentconfusableintheotherscript.Ifthereisevenonesingle character/akshara which does not have an equivalent visual confusable in other

script,itessentiallyprovidesavisualdistinctionandhenceanon-confusablestring.

Devanagariconfusable Otherscriptconfusable Fromscript

◌ः

U+0903

◌ઃU+0A83 Gujarati

◌ः

U+0903

◌ః

U+0C03Telugu

◌ः

U+0903

◌ಃ

U+0C83Kannada

◌ः

U+0903

◌ഃU+0D03

Malayalam

◌ः

U+0903

ඃU+0A28 Sinhala


54

उ

U+0909

ওU+0993

Bengali

घ

U+0918

ঘU+0998

Bengali

ठ

U+0920

ਨU+0A28

Gurmukhi

ठ

U+0920

ਰU+0A30

Gurmukhi

ड

U+0921

ਡU+0A21

Gurmukhi

ड

U+0921

ਤU+0A24

Gurmukhi

ढU+0922

ਢU+0A22

Gurmukhi

त

U+0924

ਜU+0A1C

Gurmukhi

य

U+092F

ਧU+0A27

Gurmukhi

◌ॅ

U+0945

◌ঁU+0981

Bengali

Table 21: Devanagari Cross-script confusables

Date post:	17-Feb-2020
Category:	Documents
Upload:	others
View:	6 times
Download:	0 times

Proposal for a Devanagari Script Root Zone Label ... · Dhundhari, Harauti and Wagdi. Closely...

Documents