ProposalforaDevanagariScriptRootZoneLabelGenerationRule-Set(LGR)LGRVersion:3.0
Date: 2018-07-27
Documentversion:6.1
Authors:Neo-BrahmiGenerationPanel[NBGP]
1 GeneralInformation/Overview/Abstract
Thisdocument laysdown theLabelGenerationRuleSet for theDevanagari script.Three
main components of the Devanagari Script LGR i.e. Code point repertoire, Variants andWholeLabelEvaluationRuleshavebeendescribedindetailhere.
All these components have been incorporated in a machine-readable format in the
accompanyingXMLfilenamed"Proposal-lgr-devanagari-20180727.xml".
Inaddition,adocumentnamed“Devanagari-test-labels-20180727.txt”hasbeenprovided.
ItcontainsalistofvalidandinvalidlabelsaspertheWholeLabelEvaluationlaiddownin
Section 7 of this document. The labels have been tagged as valid and invalid under thespecificrules1.Inaddition,thefilealsoliststhesetoflabelswhichcanproducevariantsaslaiddowninSection6ofthisdocument.
2 ScriptforwhichtheLGRisproposed
ISO15924Code:Deva
ISO15924KeyN°:315
ISO15924EnglishName:Devanagari(Nagari)
Latintransliterationofnativescriptname:dévanâgarî
1 The categorization of invalid labels under specific rules is given as per the general understanding of the LGR Tool by the NBGP. During testing with any LGR tool, whether a particular label gets flagged under the same rule or the different one is totally dependent on the internal implementation of the LGR Tool. In case of discrepancy among the same, the fact that it is an invalid label should only be considered.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
2
Nativenameofthescript:देवनागर(
MaximalStartingRepertoire[MSR]version:3
3 BackgroundonScriptandPrincipalLanguagesUsingIt
ThescriptcalledNagariorDevanagari iswritten fromleft toright.Historically itderives
fromtheBrahmialphabetoftheAshokaninscriptions.Devanagariiscurrentlyusedfor11out of 22 scheduled languages of India (Boro/Bodo, Dogri, Hindi, Kashmiri, Konkani,
Maithili, Marathi, Nepali, Sanskrit, Santali and Sindhi) and around 45 other languagesespecially the related Indo-Aryan languages: Bagheli, Bhili, Bhojpuri, Himachali dialects,Magahi, Newar and Rajasthani and its dialects: Marwari, Mewati, Shekhawati, Bagri,
Dhundhari, Harauti and Wagdi. Closely associated with Sanskrit and Prakrit, it is analternative script for Kashmiri (by Hindu speakers), Sindhi and Santali. It is growingpopular in use by speakers of tribal languages of Arunachal Pradesh, Bihar, Chattisgarh,
Jharkhand,MadhyaPradeshandAndaman&NicobarIslands.ThescriptisalsousedinFijitorepresentFijiHindi.Hindi isalsoa languageofcommunication inMauritius,Malaysia,England, Canada, South Africa, Indonesia as well as emigrant communities around the
world.ThescriptisalsousedinNepalforwritingtheNepalilanguage.NepaliistheofficiallanguageofNepal aswell asone languageof the stateofSikkim in India. It isspokenbyover30millionpeople.
Devanagari is used by over 120 languages in India, Bangladesh, Nepal and in Southeast
Asia.
3.1 TheEvolutionoftheScript
It is well known that Devanagari has evolved from the parent script Brahmi, with its
earliesthistoricalformknownasAśokanBrahmi,tracedtothe4thcenturyBC.Brahmiwasdeciphered by Sir James Prinsep in 1837. The study of Brahmi and its development has
shownthatithasgivenrisetomostofthescriptsinIndiaaswellasinothercountriesviz.SriLanka,Myanmar,Cambodia,Thailand,Laos,andtheregionofTibettonameafew.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
3
The evolution of Brahmi into present-day Devanagari involved intermediate forms,
commontootherscriptssuchasGupta,anditstwogenerates–SiddaṃandŚāradāinthe
northandGranthaandKadamba intheSouth.DevanagaricanbesaidtohavedevelopedfromtheKutilascript,adescendantoftheGuptascript,inturnadescendentofBrahmi.Theword "kutila", meaning ‘crooked’, was used as a descriptive term to characterize the
curvingshapesof thescript,comparedtothestraight linesofBrahmi.This inheritance isthereasonwhysomeofthecharactersacrossthescriptsthatwillbeconsideredundertheNeo-BrahmiGPlooksimilartoeachotherdespitebelongingtototallydifferentcodeblocks
oftheUnicodeStandard.
AlookatthedevelopmentofDevanagarifromBrahmigivesaninsightintohowtheIndic
scripts have come to be diversified: the handiwork of engravers and writers who used
differenttypesofstrokesledtodifferentregionalstyles.Thedevelopmentofthescriptisoutlined below. Figure 1: Pictorial depiction of evolution of Devanagari illustrates thestagesintheevolutionofthescript2.
Period Description
300BCE Mauryan: Early Brahmi form in the Asokan edicts. Some scholars believe thatBrahmiitselfevolvedfrom"Kharoshthi"ascriptwrittenrighttoleft.
200CE Kushan/SatavahanaDynasties.
400CE GuptaDynasty
600CE Yasodharman
800CE Origins of the present day Nagari Script. Vardhana dynasty in the North andPallavaperiodintheSouth.
900CE TheperiodoftheChalukyasandRashtrakutas
1100CE ContinuationoftheChalukyaRule
1300CE YadavasinthenorthandKakatiyasinthesouth.
1500CE TheVijayanagarempire.
Table 1: Evolution of Devanagari
2http://www.acharya.gen.in:8080/sanskrit/script_dev.php
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
4
Figure 1: Pictorial depiction of evolution of Devanagari
3.2 Languagesconsidered
Devanagariisusedbyover120languageswhichmakesitoneofthemostusedscriptsin
the world. Languages using Devanagari as their primary script belong to varying geo-politicalscenariosasgivenbelow:
- designatedasofficial(scheduled)languagesofsomecountries
- usedbycommunitieslivinginurbanareas
- usedbycommunitieslivinginruralyetaccessibleareas
- usedbycommunitieslivinginfar-flungareaswhicharenoteasilyconnectedeitherbyroadsorbycommunicationmechanisms.
Information about official (scheduled) languages of countries is easily available.
Information about languages used by communities living in urban areas is also easilyobtainable. There was some effort needed to cover the languages which are spoken bycommunitieslivinginruralyetaccessibleareas.However,itwasquitedifficulttocoverthe
restofthelanguagesbeingspokenbythecommunitieslivinginremotetribalareas,whichare generally not connected by road or by communicationmeans. Defining the scope oflanguagecoveragewashenceessentialtolimitthescopeoftheworktobeundertakenfor
theanalysisoftheDevanagariLGR.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
5
NBGPdecided toemploy “ExpandedGraded IntergenerationalDisruptionScale” [EGIDS],
which is designed to measure the status of the languages of the world in terms of
endangermentordevelopment.TheEGIDSconsistsof13 levelswitheachhighernumberonthescalerepresentingagreaterlevelofdisruptiontotheintergenerationaltransmissionofthelanguage.NBGPdecidedtoaccommodateallthelanguagesbelongingtoEGIDSScale
1to4foritsanalysiswhichrepresentslanguagesinoneformortheotherarestillinusage.Followingarethedescriptions3ofthosescales.
Scale Label Description
1 National Thelanguageiswidelyusedbetweennationsintrade,knowledge
exchange,andinternationalpolicy.
2 Provincial The language is used in education, work, mass media, and
governmentatthenationallevel.
3 Wider
Communication
The language is used in education, work, mass media, and
governmentwithinmajoradministrativesubdivisionsofanation.
4 Educational The language is in vigorous use, with standardization and
literature being sustained through a widespread system of
institutionallysupportededucation.
LanguagesbelongingtoLevel5andhigherarenotinwidespreadusage.
Below is the tabular representation of the languages that have been considered for the
DevanagariLGR.
EGIDSScale1 EGIDSScale2 EGIDSScale3 EGIDSScale4
Hindi
Nepali
Konkani
Maithili
Marathi
Sindhi
Bhatri
Halbi
Kinnauri
Kukna
Bhojpuri
Chhattisgarhi
Dogri
Kashmiri
3https://www.ethnologue.com/about/language-status
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
6
Panchpargania
Sadri
Wagdi
Limbu
Magahi
Sanskrit
Santali
Tamang,Eastern
Avadhi
Newar
Saraiki4
Table 2: Languages considered under Devanagari LGR
DespitebeingclassifiedunderEGIDSScale5, theBorolanguage isalsoconsideredunder
theDevanagariLGRasitisoneofthescheduledlanguagesofIndiaandiswidelyspoken.
Apartfromtheabove-mentionedlanguages,Braj,Dhundari,Mundari,andKhariahavealso
been considered for the analysis as the community using themwas accessible and they
providedtheirinputs.3.2.1 CaseofSanskrit
Sanskritisgenerallyperceivedasanarchaiclanguageusedonlyinancientreligioustexts.
However, it is worth noting that there is a quite vibrant and active user community ofSanskrit in Indiawhich practices Sanskrit on day to day basis. Sanskrit is still taught in
schools under various State and Central educational boards. There is increasing use ofSanskrit on socialmedia aswell. The same is reflected in EGIDS scalewhere Sanskrit iscategorizedinScale4indicatingstatusofthelanguageas“Educational”.
3.3 ThestructureofwrittenDevanagari
Devanagariisanalphasyllabaryandtheheartofthewritingsystemistheakshar.Itisthis
unit,which is instinctivelyrecognizedbyusersof thescript.Tounderstandthenotionofakshar, abriefoverviewof thewritingsystem isprovided in thissectionandtheakshar
itselfwillbetreatedindepthinSection5.4.
4 Though listed in EGIDS scale 4, Saraiki is not covered by the NBGP. As per Ethnologue, the Devanagari script is "no longer in use" by the Saraiki community. Ref: https://www.ethnologue.com/language/skr
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
7
ThewritingsystemofDevanagaricouldbesummedupascomposedofthefollowing:
3.3.1 TheConsonants
Devanagari consonants have an implicit schwa5 /ə/ vowel included in them. As per
traditional classification they are categorized according to their phonetic properties(especially in terms of place plus manner of articulation). There are 5 Varga groups
(classes)andonenon-Vargagroup.EachVarga,whichcorrespondstoStops,containsfiveconsonantsclassifiedaspertheirproperties.Thefirstfourconsonantsareclassifiedonthebasisofvoicingandaspirationandthelastisthecorrespondingnasal.
Varga Unvoiced Voiced Nasal
-Asp +Asp -Asp +Asp
Velar क U+0915
ख U+0916
ग U+0917
घ U+0918
ङ U+0919
Palatal च U+091A
छ U+091B
ज U+091C
झ U+091D
ञ U+091E
Retroflex ट U+091F
ठ U+0920
ड U+0921
ढ U+0922
ण U+0923
Dental त U+0924
थ U+0925
द U+0926
ध U+0927
न U+0928
Bi-labial प U+092A
फ U+092B
ब U+092C
भ U+092D
म U+092E
Table 3: Varga classification of consonants
Non-Varga
य U+092F
र U+0930
ल U+0932
ळ U+0933
व U+0935
श U+0936
ष U+0937
स U+0938
ह U+0939
Table 4: Non-Varga consonants
5Although representing the implicit vowel as /a/ is more correct orthographically, the schwa /ə/, although not part of the orthographic system has been used since the /a/ would be misunderstood and read as अ/आ/◌ा.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
8
3.3.2 TheImplicitVowelKiller:Halant6
Allconsonantscontainan implicitvowel(schwa).Aspecialsign isneededtodenotethat
this implicit vowel is strippedoff.This isknownas theHalant"◌्" (U+094D).TheHalantthus joins two consonants and creates conjuncts, which can be generally from 2 to 4consonantcombinations.Inrarecasesitcanjoinupto5consonants.However,thenotion
ofmaximumnumber of consonants joining to formone akshar is empirical. It is just anobservationdrawnfromthewordsthathavebeenobservedtodate.GiventheconfluenceoflanguageshappeningintheInternetage,thepossibilitythatonemaywantagenericTop
LevelDomain[gTLD]whichmayhavemorethantheobservedmaximumcannotberuledout.Hence,intheLGRwork,thislimitwillnotbeenforced7.
3.3.3 Vowels
Separate symbolsexist forallVowels,whicharepronounced independentlyeitherat the
beginningorafteravowelsound.ToindicateaVowelsoundotherthantheimplicitone,aVowelsign(Matra)isattachedtotheconsonant.Sincetheconsonanthasabuilt-inschwa,
thereareequivalentMatrasforallvowelsexceptingtheअ.
Thecorrelationisshownasfollows:
Vowel
Corresponding
vowelsign
(Matra)
अ U+0905
आ U+0906
◌ा U+093E
इ U+0907
ि◌ U+093F
6 Unicode (cf. Unicode 3.0 and above) prefers the term Virama. In this report both the terms have been used to denote the character that suppresses the inherent vowel. 7This can be the case when a foreign language word, which admits a large number of consonants, is transliterated into Devanāgarī
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
9
ई U+0908
◌ी U+0940
उ U+0909
◌ु U+0941
ऊ U+090A
◌ू U+0942
ऋ U+090B
◌ृ U+0943
ए U+090F
◌े U+0947
ऐ U+0910
◌ै U+0948
ओ U+0913
◌ो U+094B
औ U+0914
◌ौ U+094C
ॳ U+0973
◌ऺ U+093A
ॴ U+0974
◌ऻ U+093B
ऍ/ॲ U+090D/U+0972
◌ॅ U+0945
ॠ U+0960
◌ॄ U+0944
ऑ U+0911
◌ॉ U+0949
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
10
ॵ U+0975
◌ॏ U+094F
ॶ U+0976
◌ॖ U+0956
ॷ U+0977
◌ॗ U+0957
Table 5: Vowels with corresponding Matras
Marathiusesॲ(U+0972)insteadofऍ(U+090D).
3.3.4 TheAnusvara(◌ं-U+0902)
The Anusvara represents a homorganic nasal. It replaces a conjunct group of a Nasal
Consonant + Halant + Consonant belonging to that particular varga. Before a non-vargaconsonant the Anusvara represents a nasal sound. Modern Hindi, Marathi and Konkani
languagesprefertheAnusvaratothecorrespondingHalf-nasal8:
सfतvs.सतं/sənt/saint चgपा vs. चंपा /tʃəmpa/ A flower: belonging to the
genusPlumeriafamilyU+0938 U+0928 U+094D U+0924 vs. U+0938 U+0902 U+0924 U+091A U+092E U+094D U+092A U+093E vs. U+091A U+0902 U+092A U+093E
3.3.5 Nasalization:Candrabindu(◌ँ-U+0901)
Candrabindu denotes nasalization of the preceding vowel as in आँख/ãkh/eye (U+0906U+0901 U+0916). Present-day Hindi users tend to replace the Candrabindu by theAnusvara.
8 A half-nasal is used in epigraphy to indicate a nasal consonant conjoined to its corresponding “Varga” through a Halant.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
11
3.3.6 Nukta(◌़-U+093C)9
TheNuktasignisplacedbelowacertainnumberofconsonantstorepresentsoundsfound
only inwords borrowed fromPerso-Arabic. It is pre-dominantly used in thismanner inBodo, Hindi, Kashmiri, Maithili, Santali, Sindhi and Tamang. It can be adjoined to
"क"(U+0915), "ख"(U+0916), "ग"(U+0917),"ज"(U+091C) and "फ"(U+092B) to show thatwords having these consonantswith a nukta are to be pronounced in the Perso-Arabicstyle,e.g.:
lफ़रोज़/firoz/(U+092BU+093CU+093FU+0930U+094BU+091CU+093C)
Itisalsoplacedunder"ड"(U+0921)and"ढ"(U+0922)toindicateflappedsounds,e.g.:
बढ़ /bədh/(U+092CU+0922U+093C)
WebPublication"DEVANĀGARĪALPHABETANDITSROMANIZATION"[109]bytheCentral
HindiDirectorate,MinistryofHRD,GovernmentofIndia,clearlystatessuchauseofNuktainHindi.
In Bodo the Nukta is adjoined to "ड"(U+0921) [110]. In Maithili it is adjoined to “क” (U+0915),“ज” (U+091C),"ड" (U+0921)and"ढ" (U+0922)[111].InSindhi,itisadjoinedto"ख" (U+0916), "ग" (U+0917), "ज" (U+091C),"फ" (U+092B), "ड" (U+0921) and "ढ" (U+0922)[104].
InKashmiri, it canalsobeadjoined to "च" (U+091A), "छ" (U+091B)and "ज" (U+091C)[108]toindicatethelaterallyreleasedaffricates.
rाय/čāy/tea(U+091AU+093CU+093EU+092F)
sल/čhal/wash-Imperative(U+091BU+093CU+0932)
पॊज़/póz/fact(U+092AU+094AU+091CU+093C)
9The possible sets of consonants/vowels have been derived from various sources viz. Prior research carried out by Centre for Development of Advanced Computing's [C-DAC] Graphics Intelligence based Script Technologies [GIST] Research Labs (https://cdac.in/index.aspx?id=mlc_gist_about), Omniglot and inputs provided by various experts on-board the NBGP for specific languages. Only Omniglot references have been provided as they are available online.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
12
NormallyaNuktaisappendedtoaConsonant.However,theSantalilanguageusesNuktain
auniqueway.TheNuktaisadjoinedtofollowingvowelsandvowelsigns:
a. आ (U+0906)
b. ओ (U+0913)
c. ◌ा (U+093E)
d. ◌ो (U+094B)
3.3.7 Visarga(◌ः-U+0903)andAvagraha(ऽ-U+093D)TheVisarga is frequently used in Sanskrit and represents a sound very close to/h/, for
example:दःुख /duhkh/sorrow,unhappiness(U+0926U+0941U+0903U+0916).
TheAvagraha"ऽ"(U+093D)createsanextrastressontheprecedingvoweland isused inSanskrit texts. It is rarely used in other languages usingDevanagari. In case of LGR, the
AvagrahaisnotpartoftherepertoireasitisbarredintheMaximalStartingRepertoire.
3.3.8 ZeroWidthNon-joiner(U+200C)andZeroWidthJoiner(U+200D)The ZeroWidth Non-joiner (ZWNJ) is an invisible character used in certain cases (afterHalant) where default conjunct formation is to be explicitly restricted and the Halant
joining the twoconsonantsparticipating in the conjunct formationneeds tobeexplicitlyshown.Forexample,theconjunct w /ksha/ whichgetsformedbyक/ka/ + ◌्(halant) + ष/sha/getsrenderedasक् ष–whenformedbyक/ka/ + ◌्(halant) + ZeroWidthNon-joiner+ ष /sha/. In certain cases, for certain communities, this visual rendition creates adifferenceinthemannerinwhichthosecombinationsarepronounced.
TheZeroWidth Joiner(ZWJ) isanother invisiblecharacterwhich isused incertaincases(mostlyafterHalant) inwhichaparticularconjunctcombinationgetsrenderedsuchthatconstituting consonant shapes may not be directly visible in the conjunct shape. For
example,theconjunct w /ksha/ whichgetsformedbyक/ka/ + ◌्(halant) + ष/sha/doesnotshowhalfformofkajoiningwithsha.However,usingZWJ,theconstitutingconsonant’s
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
13
shapesarepreservedinthevisualdepiction:x ष–formedbyक/ka/ + ◌्(halant) + ZeroWidthJoiner+ष/sha/.
Earlier the ZWJ was recommended by the Unicode Consortium to be used to generatecertainspecialconjunctslikeEyelashRa(moredetailsinSection5.2)..However,withthe
newrecommendationsinplace,thisusageofZWJisnownotencouraged.
4 OverallDevelopmentProcessandMethodologyUnder the Neo-Brahmi Generation Panel, there are many different scripts belonging to
separateUnicodeblocks.EachofthesescriptshasbeenassignedaseparateLGR;however,
theNeo-BrahmiGPensuredthat the fundamentalphilosophybehindbuildingthoseLGRsare all in syncwith all otherBrahmi derived scripts. This is the Devanagari LGR,whichcaterstomultiplelanguageswrittenusingDevanagari,mostlybelongingtoEGIDSscale1to
4.
4.1 GuidingPrinciples
TheNBGPadoptsfollowingbroadprinciplesforselectionofcode-pointsinthecode-point
repertoireacrosstheboardforallthescriptswithinitsambit.
4.1.1 Inclusionprinciples4.1.1.1 Modernusage
Every character proposed should be in the everyday usage of a particular linguistic
community. The characters which have been encoded in the Unicode for transcription
purposes only or for archival purposeswill not be considered for inclusion in the code-pointrepertoire.
4.1.1.2 Unambiguoususe
Every character proposed shouldhave unambiguousunderstanding among the linguistic
communityaboutitsusageinthelanguage.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
14
4.1.2 Exclusionprinciples
ThemainexclusionprincipleisthatofExternalLimitsonScope.Thesecompriseprotocols
orstandardsthatarepre-requisitestotheLabelGenerationRulesets.Allfurtherprinciples
are in fact subsumed under these limitations but have been spelt out separately for thesakeofclarity.
4.1.2.1 ExternalLimitsonScope
The code point repertoire for root zone being a very special case, up the ladder in the
protocolhierarchies, thecanvasofavailablecharacters forselectionasapartof theRootZone code point repertoire is already constrained by various protocol layers beneath it.
Thefollowingthreemainprotocols/standardsactassuccessivefilters:
i. The Unicode Standard
Outofallthecharactersthatareneededbythegivenscript,ifthecharacterinquestionis
notencodedinUnicode,itcannotbeincorporatedinthecodepointrepertoire.Suchcasesarequiterare,giventheelaborateandexhaustivecharacterinclusioneffortsmadebytheUnicodeconsortium.
ii. IDNA Protocol
Unicode being the character-encoding standard for providing the maximum possiblerepresentationofagivenscript/language,ithasencodedasfaraspossibleallthepossible
charactersneededbythescript.However,thedomainnamebeingaspecializedcase,itisgoverned by an additional protocol known as IDNA (InternationalizedDomainNames inApplications).TheIDNAprotocolexcludessomecharactersoutofUnicoderepertoirefrom
beingpartofthedomainnames.
For Example,Devanagari LetterQa "क़" (U+0958) is not allowed to be a part of domainname. Itsdecomposedform, i.e.DevanagariLetterKa followedbyDevanagariSignNukta
"क"(U+0915)+"◌़"(U+093C)canbeusedinstead.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
15
IDNA also imposes restrictions on invisible characters ZeroWidth Non-Joiner (U+200C)
and ZeroWidth Joiner (U+200D) in the formof CONTEXTJ rules. These are required in
certaincaseswhereatypicalvisualshapeofanaksharisdesired.
Indomainnames,duetoabsenceofspace“”ortab“-”,therewillbecaseswhereinability
to use ZWNJ can pose some issueswhere twowords need to be joined togetherwhereprevious word needs to end in an Explicit Halant and the next word begins with a
consonant.Inthatcase,aconjunctwillbeformedbetweenlastconsonantofthefirstwordandthe firstconsonantof thesecondword. Thisvisualdisplaymaynotbedesired. Forexample,iftwowordsदेश्(/deš/nation) andzवदेश(/videš/foreignland)arejuxtaposedtoeachother, the resultantword i.e. “देि{वदेश”10 isnot theappropriatewayof rendering it.Appropriaterenderingofthesamewouldbe“देश ्zवदेश”whichcanbeachievedbyaddingaZWNJinbetweenthetwowords.
AstheZWNJisnotpartoftheMSR,itisnotpermissibletomakesuchcombinations.Ifand
when the ZWNJ is permitted by theMSR, the then NBGPmay consider adding it to the
Devanagarirepertoireifnecessary.
However,theremaynotbemuchofanimpactofexclusionoftheZWJfromMSRasthere
arebetteralternativesalreadyavailablefordepictingthecasesforwhichZWJwasearlier
used. Somespecific shapes11maynotbeable tobemade,however therewillnotbeany
impactonthephoneticlevel.
iii. Maximal Starting Repertoire
The root zone LGR being a repertoire of the characters which are going to be used forcreationoftherootzoneTLDs,whichinturnareanevenmorespecializedcaseofdomain
names,theRootZoneLGRProcedureintroducesadditionalexclusionsonIDNAallowedset
10 In this particular case though it is possible to get the required display by dropping the explicit Halant at the end of the word, however in that case, one can argue that the pronunciation of the two words i.e. देश ्and देश is different and hence it changes the fundamental word. 11 Case of w and x ष: the first is composed with क+◌्+ष while the latter is with क+◌्+ZWJ+ष. The pronunciation of both the conjuncts is same.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
16
ofcharacters.Forexample,theDevanagariSignAvagraha"ऽ"(U+093D),evenifallowedbyIDNAprotocol,isnotpermittedintherootzonerepertoireasperthe[MSR].
Tosumup,therestrictionsstartoffwithadmittingonlysuchcharactersasarepartofthe
code-block of the given script/language. This is further narrowed down by the IDNAProtocolandfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestricts
thecharactersetassociatedwiththegivenlanguageevenmore.
4.1.2.2 NoPunctuationMarks
TheTLDsbeingidentifiers,punctuationmarkerspresentinBrahmibasedlanguagessuch
asDanda"।"(U+0964)anddoubleDanda"॥"(U+0965)willnotbeincluded.
4.1.2.3 NoSymbolsandAbbreviations
Abbreviations, weights and measures and other such iconic characters like Isshar"৺"(U+09FA),Abbreviationsign"॰"(U+0970),etc.willnotbeincluded.
4.1.2.4 NoRareandObsoleteCharacters
There are characters which have been added to Unicode to accommodate rare forms
especially like DEVANAGARI LETTER VOCALIC RR "ॠ" (U+0960) and DEVANAGARILETTER VOCALIC LL "ॡ" (U+0961) as well as their Matra forms "◌ॄ" (U+0944) and "◌ॣ"(U+0963). All such characters will not be included. This is in compliance with theConservatismprincipleaslaiddownintheRootZoneLGRProcedure.
4.1.2.5 NoStressMarkersofClassicalSanskritandVedic
StressmarkersforclassicalSanskrite.g.DEVANAGARISTRESSSIGNUDATTA"◌॑"(U+0951)andDEVANAGARISTRESSSIGNANUDATTA"◌॒"(U+0952)willnotbeincluded.ThisisalsoincompliancewiththeLetterprincipleaslaiddownintheRootZoneLGRprocedure.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
17
5 RepertoireSection5.1providesthesectionofthe[MSR]applicabletotheDevanagariscriptonwhich
theDevanagaricodepointrepertoireisbased.Section5.2detailsthecodepointrepertoirethat theNeo-BrahmiGenerationPanel [NBGP]proposestobe included intheDevanagariLGR.5.1 DevanagarisectionofMaximalStartingRepertoire[MSR]Version3
Figure 2:Devanagari Code Page from [MSR]
Colorconvention12:
Allcharactersthatareincludedinthe[MSR]-Yellowbackground
PVALIDinIDNA2008butexcludedfromtheMSRforvariousreasons-Pinkishbackground
NotPVALIDinIDNA2008-Whitebackground
12This document needs to be printed in color for this to be read correctly.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
18
5.2 CodePointRepertoire
Foreachofthecodepoints,languagereferenceshavebeengiveninthelastcolumntitled
"Reference". For the entire coverage of Devanagari code points, references of Hindi,Marathi, Sanskrit, Sindhi andKashmirihave been given. Though only five representativelanguages have been chosen for referencing, they together cover all the code points
requiredforallthelanguagesthatNBGPhasconsideredasgiveninSection3.2.
Sr.No.
UnicodeCodePoint
Glyph CharacterNameIndicSyllabicCategory
Examplelanguagesusingthecodepoint(Notexhaustive
list)
LanguagewithlowestEGIDSscaleusingthecodepoint
Reference
1. 0901 ◌ँ DEVANAGARISIGNCANDRABINDU Candrabindu
Bodo,Hindi,Kashmiri,Konkani,Maithili,Marathi,Nepali,Santaliand
Sanskrit
1Hindi,Nepali
[0],[101],[102],[103],[105],[108],[110],[111],[112],
[113]
2. 0902 ◌ं DEVANAGARISIGNANUSVARAAnusvara(Bindu)
Mostofthelanguagesgivenin
section3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
3. 0903 ◌ः DEVANAGARISIGNVISARGA VisargaMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
4. 0905 अ DEVANAGARILETTERA VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
5. 0906 आ DEVANAGARILETTERAA VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
6. 0907 इ DEVANAGARILETTERI VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
7. 0908 ई DEVANAGARILETTERII VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
19
8. 0909 उ DEVANAGARILETTERU VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
9. 090A ऊ DEVANAGARILETTERUU VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
10. 090B ऋ DEVANAGARI
LETTERVOCALICR
Vowel Hindi,Marathi,Sanskrit 1Hindi[0],[101],[102],
[103]
11. 090D ऍ DEVANAGARILETTERCANDRAE Vowel Hindi 1Hindi [0],[101]
12. 090E ऎ DEVANAGARILETTERSHORTE Vowel Kashmiri 4Kashmiri [0],[105],[108]
13. 090F ए DEVANAGARILETTERE VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
14. 0910 ऐ DEVANAGARILETTERAI VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
15. 0911 ऑ DEVANAGARI
LETTERCANDRAO
Vowel Hindi,Konkani,Marathi,Kashmiri 1Hindi[0],[100],[101],[102],[108],
[112]
16. 0912 ऒ DEVANAGARILETTERSHORTO Vowel Kashmiri 4Kashmiri [0],[105],[108]
17. 0913 ओ DEVANAGARILETTERO VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
18. 0914 औ DEVANAGARILETTERAU VowelMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
19. 0915 क DEVANAGARILETTERKA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
20
20. 0916 ख DEVANAGARILETTERKHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
21. 0917 ग DEVANAGARILETTERGA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
22. 0918 घ DEVANAGARILETTERGHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
23. 0919 ङ DEVANAGARILETTERNGA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
24. 091A च DEVANAGARILETTERCA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
25. 091B छ DEVANAGARILETTERCHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
26. 091C ज DEVANAGARILETTERJA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
27. 091D झ DEVANAGARILETTERJHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
28. 091E ञ DEVANAGARILETTERNYA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
29. 091F ट DEVANAGARILETTERTTA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
21
30. 0920 ठ DEVANAGARILETTERTTHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
31. 0921 ड DEVANAGARILETTERDDA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
32. 0922 ढ DEVANAGARILETTERDDHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
33. 0923 ण DEVANAGARILETTERNNA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
34. 0924 त DEVANAGARILETTERTA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
35. 0925 थ DEVANAGARILETTERTHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
36. 0926 द DEVANAGARILETTERDA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
37. 0927 ध DEVANAGARILETTERDHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
38. 0928 न DEVANAGARILETTERNA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
39. 092A प DEVANAGARILETTERPA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
22
40. 092B फ DEVANAGARILETTERPHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
41. 092C ब DEVANAGARILETTERBA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
42. 092D भ DEVANAGARILETTERBHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
43. 092E म DEVANAGARILETTERMA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
44. 092F य DEVANAGARILETTERYA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
45. 0930 र DEVANAGARILETTERRA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
46. 0932 ल DEVANAGARILETTERLA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
47. 0933 ळ DEVANAGARILETTERLLA ConsonantBodo,Konkani,Marathi,Nepali,
Sanskrit1Nepali
[0],[102],[103],[110],[112],
[113]
48. 0935 व DEVANAGARILETTERVA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
49. 0936 श DEVANAGARILETTERSHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
23
50. 0937 ष DEVANAGARILETTERSSA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],
[113]
51. 0938 स DEVANAGARILETTERSA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
52. 0939 ह DEVANAGARILETTERHA ConsonantMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[104],[105],[108],
[113]
53. 093A ◌ऺ DEVANAGARI
VOWEL SIGN OE Matra Kashmiri 4Kashmiri [11],[105],[108]
54. 093B ◌ऻ DEVANAGARI
VOWEL SIGN OOE Matra Kashmiri 4Kashmiri [11],[105],[108]
55. 093C ◌़ DEVANAGARISIGNNUKTA NuktaBodo,Hindi,
Kashmiri,Maithili,Santali,Sindhi
1Hindi[0],[101],[105],[108],[110],[109],[111]
56. 093E ◌ा DEVANAGARIVOWELSIGNAA MatraMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
57. 093F ि◌ DEVANAGARIVOWELSIGNI MatraMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
58. 0940 ◌ी DEVANAGARIVOWELSIGNII MatraMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
59. 0941 ◌ु DEVANAGARIVOWELSIGNU MatraMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
60. 0942 ◌ू DEVANAGARIVOWELSIGNUU MatraMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
61 0943 ◌ृ DEVANAGARIVOWELSIGNVOCALICR
Matra Hindi,Marathi,Sanskrit 1Hindi[0],[101],[102],
[103]
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
24
62. 0945 ◌ॅ
DEVANAGARIVOWELSIGNCANDRAE=candra
MatraHindi,Konkani,Marathi,Sanskrit,
Kashmiri1Hindi [0],[100],[101],[108]
63. 0946 ◌ॆ DEVANAGARIVOWELSIGNSHORTE
Matra Kashmiri 4Kashmiri [0],[105],[108]
64. 0947 ◌े DEVANAGARIVOWELSIGNE MatraMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[105],[108],[113]
65. 0948 ◌ै DEVANAGARIVOWELSIGNAI MatraMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[113]
66. 0949 ◌ॉ DEVANAGARIVOWELSIGNCANDRAO
Matra Hindi,Konkani,Marathi,Kashmiri 1Hindi [0],[100],[108]
67. 094A ऒ DEVANAGARILETTERSHORTO Matra Kashmiri 4Kashmiri [0],[105],[108]
68. 094B ◌ो DEVANAGARIVOWELSIGNO MatraMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[105],[108],[113]
69. 094C ◌ौ DEVANAGARIVOWELSIGNAU MatraMostofthe
languagesgiveninsection3.2
1Hindi,Nepali
[0],[101],[102],[103],[105],[108],[113]
70. 094D ◌् DEVANAGARISIGNVIRAMAHalant/Virama
Mostofthelanguagesgivenin
section3.2
1Hindi,Nepali
[0],[101],[102],[103],[105],[108],[113]
71. 094F ◌ॏ DEVANAGARIVOWELSIGNAW Matra Kashmiri 4Kashmiri [0],[105],[108]
72. 0956 ◌ॖ DEVANAGARIVOWELSIGNUE Matra Kashmiri 4Kashmiri [11],[105],[108]
73. 0957 ◌ॗ DEVANAGARIVOWELSIGNUUE Matra Kashmiri 4Kashmiri [11],[105],[108]
74. 0972 ॲ DEVANAGARI
LETTERCANDRAA
Vowel Konkani,Marathi,Kashmiri2Konkani,Marathi
[9],[100],[102],[108],[112]
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
25
75. 0973 ॳ DEVANAGARILETTEROE Vowel Kashmiri 4Kashmiri [11],[105],[108]
76. 0974 ॴ DEVANAGARILETTEROOE Vowel Kashmiri 4Kashmiri [11],[105],[108]
77. 0975 ॵ DEVANAGARILETTERAW Vowel Kashmiri 4Kashmiri [11],[105],[108]
78. 0976 ॶ DEVANAGARILETTERUE Vowel Kashmiri 4Kashmiri [11],[105],[108]
79. 0977 ॷ DEVANAGARILETTERUUE Vowel Kashmiri 4Kashmiri [11],[105],[108]
80. 097B ॻ DEVANAGARILETTERGGA Consonant Sindhi 2Sindhi [8],[104]
81. 097C ॼ DEVANAGARILETTERJJA Consonant Sindhi 2Sindhi [8],[104]
82. 097E ॾ DEVANAGARILETTERDDDA Consonant Sindhi 2Sindhi [8],[104]
83. 097F ॿ DEVANAGARILETTERBBA Consonant Sindhi 2Sindhi [8],[104]
Table 6: Code point repertoire
Apart from the above individual code-points, the Neo-Brahmi Generation Panel alsoproposessomespecificsequenceswhichenableconditionalinclusionofthe"DEVANAGARI
LETTERRRA"intherepertoireforenablinginclusionof“EyelashReph”13construct.
Sr.No. UnicodeCodePoints Sequence CharacterNames
Examplelanguagesusingthecode-point
(Notexhaustive
list)
Reference
1.
0931
094D
092F
य
DEVANAGARILETTERRRA
DEVANAGARISIGNVIRAMA
DEVANAGARILETTERYA
Konkani,Marathi,Nepali
[106],[107]
13 Unicode uses the term “Eyelash Ra” instead. Since the construct that is formed by this sequence is a special form of Reph (which is otherwise formed by Normal Ra U+0930), the term “Reph” is used here.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
26
2.
0931
094D
0939
ह
DEVANAGARILETTERRRA
DEVANAGARISIGNVIRAMA
DEVANAGARILETTERHA
Konkani,Marathi,Nepali
[106],[107]
Table 7: Sequences
5.3 CodepointsnotincludedThefollowingcodepointshavenotbeenincludedintherepertoire.
Sr.No.
UnicodeCodePoint
Glyph CharacterName Reasonforexclusion
1. U+0904 ऄ DEVANAGARILETTERSHORTAUsageunknown.Notrequiredexplicitlybyanylanguage.
2. U+090C ऌ DEVANAGARILETTERVOCALICLNotinmodernusage.Excludedas
perconservatismprinciple.
3. U+0929 ऩ DEVANAGARILETTERNNNANotrequiredinanyspokenlanguage.Requiredonlyfor
transcribingDravidianalveolarn.
4. U+0934 ऴ DEVANAGARILETTERLLLANotrequiredinanyspokenlanguage.RequiredonlyfortranscribingDravidianl.
5. U+0944 ◌ॄ DEVANAGARIVOWELSIGNVOCALICRRNotinmodernusage.Excludedas
perconservatismprinciple.
6. U+0979 ॹ DEVANAGARILETTERZHANotrequiredinanyspokenlanguage.RequiredonlyintransliterationofAvestan.
7. U+097A ॺ DEVANAGARILETTERHEAVYYAUsageunknown.Notrequiredexplicitlybyanylanguage.
5.4 StructuralFormationofDevanagari:
AllthelanguageswritteninBrahmiderivedscriptsfollowaparticularwayofformationof
theirwords,knownasakshar.Inthenextsectiontherearedetailedaksharformationrulesas applicable to representation of the Hindi language when written in the Devanagari
Script.These rulesneed slightadditions fordifferent languageswritten inDevanagari intermsof:
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
27
-Characteraddition/deletion(e.g.Nukta[U+093C]characterisapplicableforHindi
butnotMarathi)
-Presenceorabsenceofaparticularrule(e.g.EyelashRephconstructisrequiredin
Marathi,KonkaniandNepalibutnotinHindi).
Itisworthnotingthattherulesrequiredforaccommodationofadditionallanguagesinthe
Devanagari ruleset apart from those required for Hindi are never in conflict with oneanother.
In Section 7, the Whole Label Evaluation (WLE) rules are given which cover all the
languagesunderthepurviewoftheNBGPfortheDevanagariscript.
5.5 AksharformationrulesforHindi
ThissectiondetailstheaksharformationrulesasapplicabletoHindi.Thefirstsectionlists
the categories of the characters in the form of variables. In the rules, instead of theirdescriptive names, the variable names are used. The second section lists four operatorsalongwiththeirfunctionswhichareassumedwhilespecifyingtherules.Thefollowingtwo
sectionsdescribethetwomajorcategoriesof theakshar formations firstofwhichbeginswith the vowels and the second one with the consonants. These rules are based on anIndianStandard(IS13194:1991)popularlyknownas"IndianScriptCodeforInformation
Interchange"[ISCII].5.5.1 Variablesinvolved
Dash →Hyphen-Digit →Indo-Arabicdigits[0-9]
C →ConsonantM →Matra
V →Vowel
B →Anusvara(Bindu)D →Candrabindu
X →Visarga
H →Halant/ViramaN →Nukta
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
28
5.5.2 Operatorsused:
Symbol Function
| Alternative
[] Optional
* VariableRepetition
() SequenceGroup
Table 8: Symbol functions
Inwhatfollows,theVowelSequenceandtheConsonantSequencepertinenttoDevanagari,
whenusedtowriteHindi,aregiven.
5.5.3 TheVowelSequence
Avowelsequencebeginswithavowel.ItmaybeoptionallyfollowedbyanAnusvara(B),
Candrabindu (D) or a Visarga (X). The number of B, D or X which can follow a V in
Devanagariarerestrictedtoone.
Thepossibilityof aVisarga followingaCandrabinduorAnusvara is ruledout, since it is
usedonlyinVedicandinBengaliscript.
ThevowelsequenceinHindiisthereforeV[B|D|X]
Examples:SequenceDescription Sequence Example Constituting
characters
Vowel Vअ /a/
U+0905
Vowel+Anusvara V[B]अं /aṁ/
U+0905U+0902
अ ◌ं
U+0905U+0902
Vowel+Candrabindu V[D]अँ /aṃ/
U+0905U+0901
अ ◌ँ
U+0905U+0901
Vowel+Visarga V[X] अः /aḥ/ अ ◌ः
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
29
U+0905U+0903 U+0905U+0903 Table 9
5.5.4 ConsonantSequence
Aconsonantsequencebeginswithaconsonant. Itmaybeoptionally followedbyaNukta
(N),Matra(M),Anusvara(B),Candrabindu(D),Visarga(X)oraHalant(H).Thenumberof
instances of these characters occurring after a consonant is restricted to one. There is apossibilityof further extension of the Consonant sequence after theN,M andH. Each ofthesehasbeendiscussedinthefollowingsections:
1.Asingleconsonant(C)
(TheconsonantshallbetreatedascoterminouswiththeConsonantalongwiththeNukta
signwhereversuchacaseispertinent.)Examples:
SequenceDescription Sequence Example Constitutingcharacters
Consonant Cक /ka/
U+0915
Consonant+Nukta C[N] क़ /ḳa/ क ◌़
U+0915U+093C Table 10
2. A consonantoptionally followed by dependent vowel sign/Matra [M] orAnusvara [D]Candrabindu[B]orVisarga[X]orHalant[H]
C[M|B|D|X|H]
Examples:
SequenceDescription Sequence Example Constitutingcharacters
Consonant+Matra C[M] lक /ki/ क ि◌
U+0915U+093F
Consonant+Anusvara C[B] कं /kaṁ/ क ◌ं
U+0915U+0902
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
30
Consonant+Candrabindu C[D] कँ /kaṃ/ क ◌ँ
U+0915U+0901
Consonant+Visarga C[X] कः /kaḥ/ क ◌ः
U+0915U+0903
Consonant+Halant C[H] क् /k/(PureConsonant)
क ◌्
U+0915U+094D Table 11
2.A.ACMsequencecanbeoptionallyfollowedbyD,BorX
(CM)[D|B|X]
Example:
SequenceDescription Sequence Example Constitutingcharacters
Consonant+Matra+Anusvara CM[B] कं /kīṁ/ क ◌ी ◌ं
U+0915U+0940U+0902
Consonant+Matra+Candrabindu CM[D] काँ /kāṃ/ क ◌ा ◌ँ
U+0915U+093EU+0901
Consonant+Matra+Visarga CM[X] कः /kīḥ/ क ◌ी ◌ः
U+0915U+0940U+0903 Table 12
3.Asequenceofconsonants(upto4)joinedbyHalant14*3(CH)C
Example:
SequenceDescription Sequence Example Constitutingcharacters
Consonant+Halant+Consonant+Halant+Consonant+Halant+Consonant
CHCHCHC fय/nkrya/fय
U+0928U+094DU+0915U+094DU+0930U+094D
14 In case of Sanskrit, it can join upto 5 consonants.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
31
U+092F Table 13
However, in theWLE rules proposed in Section 7 do not impose any restriction on thenumberofconsonantsthatcanbejoinedbyaHalant.Subsets:
3.A.ThecombinationmaybefollowedbyM,B,DorX
Example:
SequenceDescription Sequence Example Constitutingcharacters
Consonant+Halant+Consonant+Matra CHC[M] xक /kkī/xक
U+0915U+094DU+0915U+0940
Consonant+Halant+Consonant+Anusvara CHC[B]xकं
/kkaṁ/
xकं
U+0915U+094DU+0915U+0902
Consonant+Halant+Consonant+Candrabindu CHC[D]xकँ
/kkaṃ/
xकँ
U+0915U+094DU+0915U+0901
Consonant+Halant+Consonant+Visarga CHC[X]xकः
/kkaḥ/
xकः
U+0915U+094DU+0915U+0903
Table 14
3.B.*3(CH)CMmaybefollowedbyaB,DorX
Example:
SequenceDescription Sequence Example Constitutingcharacters
Consonant+Halant+Consonant+Matra+Anusvara CHCM[B] xकं /kkīṁ/
xकं
U+0915U+094DU+0915U+0940U+0902
Consonant+Halant+Consonant+Matra+Candrabindu CHCM[D] xक /kkīṃ/
xक
U+0915U+094DU+0915
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
32
U+0940U+0901
Consonant+Halant+Consonant+Matra+Visarga CHCM[X] xकः /kkīḥ/
xकः
U+0915U+094DU+0915U+0940U+0903
Table 15
ThesearethebasicaksharformationrulesonwhichtheoverallDevanagariLGRisbased.
AslanguagesotherthanHindiareconsidered,someadditionallanguage-specificcharactersand rules are introduced. There are some additional finer aspects to these rules as onetakesintoaccountthedigits,punctuationsandspecialstandalonecharacterslikeAvagraha.
Thoseaspectsarenotdiscussedhereasthe[MSR]onwhichtheLGRsaresupposedtobebased,excludesthosecharacters.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
33
6 VariantsTherearenocharacters/charactersequencesinDevanagariwhichcanbecreatedbyusing
thecharacterspermittedasperthe[MSR]andthatlookexactlyalike.However,Devanagarihas ample cases of confusingly similar variants. TheNBGP categorizes these confusinglysimilarvariantsintwogroups.
Group1:Confusingduetopurevisualsimilarity
Group2: Confusing due to deviation from normally perceived character
formationsbylargerlinguisticcommunity
AsadvisedbyICANN,nocasesbelongingtoGroup1areproposed,asthereisanotherpanel
(Stringsimilarityassessmentpanel)entrustedtodealwithsuchcases."Table20:Visuallyconfusables"in"AppendixA:Visuallyconfusablecharacters/sequences"liststhem.
CaseswhichbelongtoGroup2,however,areproposedtobeconsideredasvariants.These
cases are not ofmere visual similarity as they involve some deviations from thewidelyaccepted norms of Devanagari akshar formations. These can cause confusion even to acarefulobserverandhencebeingproposedasvariants.Followingisthebriefdescriptionof
thesevariantsfollowedbyvariantsinTable16andTable17.
6.1 Vowel/VowelsignfollowedbyNukta
The Santali language has a unique requirement for Nukta character "◌़"(U+093C)positioning,which is not common inotherDevanagari based languages. Santali requires
theNuktacharactertofollowcertainVowelsandMatras.CompleterepresentationoftheseSantali combinationsnecessitated theWholeLabelEvaluation rules (given intheSection6.1)tobeopenedupforthesespecificcases.Aregularnon-Santaliusermostlycannoteven
anticipatethepossibilityofsuchacombinationandcanconfuseitforsomethingelse.
Thisgivesrisetoapossibilityofcreationofcertainlabelsthatcanbedeceptivelysimilarto
amajorityoftheDevanagariuser-base.Beingauniquecaseofhomographicsimilarity,the
followingvariantsarebeingproposed.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
34
Variant1 Variant2
आU+0906
आ़U+0906U+093C
ओU+0913
ओ़U+0913U+093C
◌ाU+093E
◌ा◌़U+093EU+093C
◌ोU+094B
◌ो◌़U+094BU+093C
Table 16: Proposed Variants - Set 1
6.1.1 VariantcontextruleforSantaliNuktavariants:
All of the Nukta variants given in "Table 16: Proposed Variants - Set 1" have a typical
characteristicwhichis,withinavariantpair,Variant1isasubsetoftheVariant2,e.g.in
thefirstpair,आ(U+0906)isasubsetofआ़(U+0906U+093C).Thisimpliesaregenerativetendency, in theory, i.e. if anआ (U+0906) is substituted with आ़ (U+0906 U+093C), itintroducesanewinstanceofआ(U+0906)asseenhereinbold:आ़(U+0906U+093C).Bydefinition,thisnewcaseofआ(U+0906)mayalsoneedtobesubstitutedwithआ़(U+0906U+093C) therebycreatingan invalidakshar combination आ़़ (U+0906U+093CU+093C)whereaNuktawillneedtofollowanotherNukta.Topreventthis,avariantcontextrulehas
beenaddedtoalltheabovenuktavariantsasgivenbelow.
Rule:Asperthe"Table 16: Proposed Variants - Set 1"theVariant1toVariant2relationship
existsifandonlyifanyoftheVariant1setcharacterisnotfollowedbyaNukta(U+093C)
character.Thus,followingvariantrelationsareboundbytheabovecondition:
आ(U+0906) → आ़(U+0906U+093C)
ओ(U+0913) → ओ़(U+0913U+093C)
◌ा(U+093E) → ◌ा◌़(U+093EU+093C)
◌ो(U+094B) → ◌ो◌़(U+094BU+093C)
ThevariantrelationshipfromVariant2toVariant1isnotconstrainedbyanyruleasit
doesnotgiverisetotheinvalidnuktacombination.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
35
6.2 UniqueVowelsandVowelSignsrequiredforKashmiri
Kashmiriwhenwritten in Devanagari script requires a unique set of Vowels and Vowel
signswhich only a Kashmiri speaker can understand. Themajority of Devanagari userswhoarenotconversantwithKashmiricaneasilyconfusethemwithsomeoftheVowels/
VowelsignswhichlooksimilartotheKashmiriones.TherearealsocaseswhereaKashmiriVowel/Vowelsignscanbeconfusedwithcertainaksharformations.Hence,theyarebeingproposedasvariants.
Variant1 Variant2
ॳ U+0973
अं U+0905U+0902
◌ऺ U+093A
◌ं U+0902
ॴ U+0974
आं U+0906U+0902
◌ऻ U+093B
◌ा◌ं U+093EU+0902
ऎ U+090E
ऐ U+0910
◌ॆ U+0946
◌े U+0947
ॵ U+0975
औ U+0914
◌ॏ U+094F
◌ौ U+094C
Table 17: Proposed Variants - Set 2
6.3 HalantinFinalPosition(Onlyadiscussion,notproposedasvariants)
AnothercaseofdeceptivesimilaritytoamajorityoftheDevanagariuserbaseisofaword
ending inHalant "◌्" (U+094D) vis-à-vis the samewordwithout the finalHalant. As thefunctionofHalant isof a vowelkiller, comingat theend,manyusers tend to ignore the
phonetic effect of its presence/absence. The majority of users would pronounce bothwords in the same way, thereby creating a perception of (false) equivalence. However,there also exist some userswho clearly require the final Halant to achieve the peculiar
phonetic effectof a truncated implicit vowel sound in theend.Theseusersmakea clear
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
36
distinctionbetweenthetwowords(withandwithoutthefinalHalant).Itisforthisreasonthat the final Halant is being accommodated in the Whole Label Evaluation rules for
Devanagari.
In these cases, the presence or absence of finalHalant is clearly visible, and there is no
apparentcasetomakethemvariantpairs.Eventually,inthelightofpracticalexperience,a
futureNBGPrevisionmayassessifthesecasesneedtobeconsideredasvariantpairs.
6.4 VariantDisposition
Asvariantsmentionedinboth(Table16andTable17)categoriesareconfusinglysimilar,
albeitofapeculiarnature,itisproposedthattheybeconsideredof"blocked"nature.
There is no preference among these variants.Whichever label containing either of these
variantsischosenearlier,theotherequivalentvariantlabelshouldbeblocked.
6.5 Cross-scriptVariants
Across-scriptvariant,alsosometimesreferredtoas"WholeLabelvariant", is thevariant
casewhereonelabelinonescriptcanbecomposedinsuchawaythatitresemblesanotherentirelabelinadifferentscript.
Every individualLGRunderNBGPissupposed toprovideasetof cross scriptvariants it
identifieswithallotherscriptsunderNBGP.
NBGP has ensured that not only the individual characters but also most of the akshar
variations are taken into consideration during the Cross-script variant analysis ofDevanagariwithall theotherscriptsunderNBGP.Thiswasachievedby sharinga listofmost of the Devanagari akshar combinationswith all the other script teams. (Theword
‘most’ is used here as it is not practical to cover all the possible “Consonant + Halant +Consonant + ….” cases. However, for Devanagari, all cases of “Consonant + Halant +Consonant”combinationswereincludedintheanalysis.)
The Devanagari script has a major set of possible cross-script variants only with the
Gurmukhiscript.CaseslistedinTable18areofthevariantsthatareproposedtobecross-
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
37
script variants between Devanagari and Gurmukhi. Similarly, Table 19 has the casesproposedtobecross-scriptvariantsbetweenDevanagariandBengali.
ItistobenotedthatnoneofthecombinationslistedinTable18andTable19aretermedto
beequivalentsof eachother semanticallyorotherwise.Theyareonlygroupedbasedonpossiblevisualconfusability.
NBGPhasensuredthatDevanagari,BengaliandGurmukhiLGRteamsproposeasameset
ofcross-scriptvariantsbymeetingface-to-faceonmanyoccasionsaswellasthroughmailcommunications.Thesamesetofcross-scriptvariants(withDevanagari)issupposedtobe
foundintheBengaliandGurmukhiLGRdocuments.
Devanagari Gurmukhi
◌ंU+0902
◌ਂU+0A02
इU+0907
ਙU+0A19
उU+0909
ਤU+0A24
गU+0917
ਗU+0A17
घU+0918
ਬU+0A2C
टU+091F
ਟU+0A1F
ठU+0920
ਠU+0A20
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
38
ढU+0922
ਫU+0A2B
पU+092A
ਧU+0A27
भ
U+092Dਮ
U+0A2E
मU+092E
ਸU+0A38
वU+0935
ਕU+0A15
हU+0939
ਵU+0A35
◌ऺ U+093A
◌ਂU+0A02
◌़U+093C
◌਼U+0A3C
ि◌U+093F
ਿ◌U+0A3F
◌ीU+0940
◌ੀU+0A40
◌ॅU+0945
◌ੱU+0A71
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
39
◌ॆU+0946
◌ੇU+0A47
◌ॆU+0946
◌ੋU+0A4B
◌े
U+0947◌ ੇ
U+0A47
◌ेU+0947
◌ ੋU+0A4B
◌ैU+0948
◌ੈU+0A48
◌ॖ
U+0956◌ੁ
U+0A41
◌ॗ
U+0957◌ੂ
U+0A42
ि7टU+092AU+094DU+091FU+093F
ਇU+0A07
7ट8U+092AU+094DU+091FU+0940
ਈU+0A08
7टेU+092AU+094DU+091FU+0947
ਏU+0A0F
9U+0924U+094DU+0924
ਜU+0A1C
Table 18: Proposed Cross-script Devanagari-Gurmukhi Variants
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
40
Devanagari Bengali
मU+092E
মU+09AE
ि◌U+093F
ি◌U+09BF
Table 19: Proposed Cross-script Devanagari-Bengali Variants
In addition to above cases, the Devanagari and Gurmukhi scripts have a possible set ofcross-scriptconfusables,whichlooksimilarbutnotsimilarenoughtoberecommendedascross-scriptvariants.The"Table21:DevanagariCross-scriptconfusables"in"AppendixB:
Cross-scriptConfusables"liststhem.
7 WholeLabelEvaluationRules(WLE)ThissectionprovidestheWLEsthatarerequiredbyallthelanguagesmentionedinSection
3.2whenwritteninDevanagariScript.TheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecification.
Belowarethesymbolsused intheWLErules,foreachof the"IndicSyllabicCategory"as
mentionedintheTable6:Codepointrepertoire.
C → Consonant
M → Matra
V → Vowel
B → Anusvara(Bindu)
D → Candrabindu
X → Visarga
H → Halant/Virama
N → Nukta
S → EyelashReph(C2HC3)whereC2is0931(ऱ-DEVANAGARILETTERRRA)
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
41
His094D(◌् - DEVANAGARISIGNVIRAMA)C3iseither-092F(य - DEVANAGARILETTERYA)or0939(ह - DEVANAGARILETTERHA)
BelowarethespecificWLErules:
1. N:mustbeprecededonlybyamemberofC1,V1orM1.
ThesetC1consistsoftheseconsonants:
a. क (U+0915)
b. ख (U+0916)
c. ग (U+0917)
d. च (U+091A)
e. छ (U+091B)
f. ज (U+091C)
g. ड (U+0921)
h. ढ (U+0922)
i. फ (U+092B)
ThesetV1consistsofthesevowels:
a. आ (U+0906)(RequiredinSantalilanguage)
b. ओ (U+0913)(RequiredinSantalilanguage)
ThesetM1consistsofthesematras:
a. ◌ा (U+093E)(RequiredinSantalilanguage)
b. ◌ो (U+094B)(RequiredinSantalilanguage)
2. H:mustbeprecededbyCorCN15
15 where CN is a C followed by an N
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
42
3. M:mustbeprecededbyCorCN16
4. X:mustbeprecededbyeitherofV,C,NorM
5. B:mustbeprecededbyeitherofV,C,NorM
6. D:mustbeprecededbyeitherofV,C,NorM
7. V:CanNOTbeprecededbyH(detailsin"CaseofVprecededbyH")
CaseofEyelashReph
IntheWLErules,thereisnospecificmentionoftheEyelashRephfortworeasons:1. AstheU+0931isaddedasapartofpermissiblesequencesinTable7:Sequences,it
getspermittedonlywiththespecificsequences.
2. The last characters of both the sequences of which the U+0931 is part, are
consonants. As the Eyelash-Reph can take all the combinations as that of a
consonant,nospecifichandlingintermsofcontextruleisrequired.
CaseofVprecededbyH
Asanyvalidakshar inDevanagaribeginseitherwithaConsonantoraVowel, in caseofmulti-words domains, it was necessary to check the compatibility of both of these tosucceed any of the validakshar ending character. It is to be noted that only the case “VprecededbyH”needsaspecialdiscussionasgivenbelow.
There couldbe cases involvingmulti-worddomainswhereVmayneed to be allowed tofollow an H, e.g.आमअ्चार /aːməchaːr/Mango pickle (U+0906 U+092E U+094D U+0905U+091AU+093EU+0930).
ThisisthecasewheretwodifferentwordsarejoinedtogetherfirstofwhichendsinanH
andthesecondwordbeginswithaV.SomesectionsofthelinguisticcommunityrequiretheexplicitpresenceofHforfullrepresentationofthesoundintended.However,byandlarge,theformofthefirstwordwithoutanHisconsideredenoughforfullrepresentationofthe
soundintendedforthefirstword.
Thisisauniquesituationnecessitatedbythelackofhyphen,spaceortheZeroWidthNon-
joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire.Otherwise,
16 where CN is a C followed by an N
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
43
Visneverrequiredtobeallowedto followanH.Permittingthismaycreateaperceptivesimilarityamongtwolabels(withandwithoutH)formajorityofthelinguisticcommunity,
hencethisisexplicitlyprohibitedbytheNBGP.
If required in future, depending on the prevailing requirements by the community, the
NBGPmayconsiderrevisitingthisrule.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
44
8 ContributorsNBGPCo-chairs:Dr.UdayaNarayanSingh,Mr.MaheshDKulkarniandDr.AjayData
FollowingisthefulllistofNBGPmemberswiththeirLanguageexpertise.
Position Name Organization Country Language
ExpertiseCo-Chair AjayData DataXgenTechnologies India Hindi,EnglishCo-Chair MaheshD.Kulkarni C-DAC India Marathi,HindiCo-Chair UdayaNarayana
SinghVisva-Bharati,Santiniketan,WestBengal
India Bengali,Maithili,Hindi,English
Member AbhijitDutta Wikimedia India Bengali,HindiMember AkshatS.Joshi
(Editor)C-DAC India Hindi,Marathi
Member AnivarA.Aravind IndicProject India MalayalamMember AnupamAgrawal TataConsultancyService India Hindi,BengaliMember ArvindBhandari GujaratUniversity India GujaratiMember AshishModi DataXgenTechnologies India HindiMember AtiurRahmanKhan C-DAC India BanglaMember BalKrishnaBal KathmanduUniversity Nepal NepaliMember BalaramPrasain TribhuvanUniversity Nepal NepaliMember BASANTAKUMAR
PANDARegionalInstituteofEducation(NCERT)
India Odia
Member BhimDhojShrestha Consultant Nepal Nepali,NewarMember ChitritaChatterjee InternetandMobileAssociationof
India(IAMAI)India Multiplelanguages
representedbymembersofIAMAI
Member DEBAJITSHARMA AnundoramBorooahInstituteofLanguageArtandCulture
India Assamese
Member DevDassManandhar
Consultant Nepal Nepali,Newar
Member DhanalakshmiKT NorthernTrust India KannadaMember GaneshMurmu RanchiUniversity India SantaliMember GangadharPanday BabulFilmsSociety India TeluguMember GhanashyamNepal BenaresHinduUniversity&
UniversityofNorthBengalIndia Nepali
Member GirishChandraMishra
LanguageTechnologyCentre,RavenshawUniversity
India Odia
Member GurpreetSinghLehal
PunjabiUniversityPatiala India Panjabi
Member HarishChowdhary NIXI India HindiMember HempalShrestha NepalEntrepreneurs'Hub
(NEHUB)Nepal Nepali,Newar
Member JayPaudyal Consultant India HindiMember JijoPappachan DN.Domains India Malayalam
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
45
Member K.C.Tikayatray OdiaBhasaPratisthan India OdiaMember KalyanVasudeo
KaleFormerlyaffiliatedwithUniversityofPune
India Marathi
Member KuldeepPatnaik Visualizethysoul India OdiaMember MukeshSaini EsselGroup India HindiMember N.DeivaSundaram NDSLingsoftSolutionsPvtLtd India TamilMember NehaGupta C-DAC India HindiMember NirajanParajuli NREN Nepal NepaliMember NishitJain C-DAC India HindiMember PawanChitrakar Gapsco Nepal NepaliMember PrabhakarPandey C-DAC India HindiMember PrasadPK A-onePublishers India MalayalamMember PrateekPathak ISOCMumbai India DevanagariMember RaiomondDoctor NLPConsultant India English,Hindi,
Marathi,GujaratiMember RajibChakraborty SocietyforNaturalLanguage
TechnologyResearchIndia Bangla(Bengali)
Member RajivKumar NIXI India Member S.Maniam InternationalForumITforTamil Singapore TamilMember SanthoshThottingal Wikimediafoundation India Malayalam,
Sourashtra,TamilMember SarojaBhate UniversityofPune India SanskritMember ShambhuKumar
SinghNationalTranslationMission,Mysore
India Maithili
Member ShanmugamR C-DAC India TamilMember ShantaramS.Warde
WalawalikarIndependentResearcher India Konkani
Member ShashiPathania P.G.D.ofDogri,UniversityofJammu
India Dogri
Member ShubhamSaran NIXI India Member Sinnathambi
ShanmugarajahUniversityofColomboSchoolofComputing
SriLanka Tamil
Member SujithKartha Digitalkz.com India MalayalamMember SurajAdhikari MercantileCommunications(and
.npccTLD)Nepal Nepali
Member SwarnaPrabhaChainary
GuwahatiUniversity India Bodo
Member U.B.Pavanaja http://vishvakannada.com/ India KannadaMember UmaMaheshwarG CALTS,Univ.ofHyderabad India TeluguMember UttamShrestha
RanaNPNOG Nepal Nepali
Member VeenaSolomon (freelancer) India MalayalamMember VinayMurarka Consultant;https://मेरा.भारत India Hindi
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
46
In addition, following members externally gave inputs to NBGP for the respectivelanguages/scripts.
Name Language/ScriptExpertiseAjitKumar Awadhi,BrajLanguageAmarTumyahang LimbuLanguageAmritYonjan TamangLanguageApranaKulkarni Hindi,MarathiBasilBaa SadriLanguageBasilKiro KhariaLanguageBiswaLimbu LimbuLanguageDevdassManandhar NewarDevendraKumarDevesh BhojpuriLanguageDinbandhuMahto PanchparganiaLanguageDipikaSangmaNarzary BodoLanguageDrK.P.Lekhwani SindhiDr.BirendraKumarSoy MundariLanguageDr.DineshKumarShrivastav MagahiLanguageDr.HarvinderKaur GurmukhiScriptDr.LaxmiPrasadKhatiwada NepaliLanguageHariharVaishnav HalbiIndraKumarTamang TamangLanguageJagannathSingh PanchparganiaLanguageNarendraKumarNegi KinnauriLanguagePrateekHarshwal WagdiandDhundhariLanguageRayemOlemDungdung SadriLanguageTejManAngdembe LimbuLanguageUrmilaHarshwal WagdiLanguage
9 References
[MSR]IntegrationPanel,"MaximalStartingRepertoire—MSR-3OverviewandRationale",28March2018https://www.icann.org/en/system/files/files/msr-3-overview-28mar18-en.pdf
[EGIDS]ExpandedGradedIntergenerationalDisruptionScale,https://www.ethnologue.com/about/language-status(Accessedon13thNov.2017)
[NBGP]Neo-BrahmiGenerationPanel
[gTLD]genericTopLevelDomain
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
47
[ISCII]IndianScriptCodeforInformationInterchange,https://cdac.in/index.aspx?id=mlc_gist_iscii(Accessedon2ndFeb.2018)
[GIST]GraphicsIntelligencebasedScriptTechnologies,https://cdac.in/index.aspx?id=gist(Accessedon2ndFeb.2018)
[C-DAC]CentreforDevelopmentofAdvancedComputing,https://cdac.in(Accessedon2ndFeb.2018)
[0]TheUnicodeStandard1.1,http://www.unicode.org/versions/Unicode1.1.0/(Accessedon12thDec.2017)
[8]TheUnicodeStandard5.0,http://www.unicode.org/versions/Unicode5.0.0/(Accessedon12thDec.2017)
[9]TheUnicodeStandard5.1,http://www.unicode.org/versions/Unicode5.1.0/(Accessedon12thDec.2017)
[11]TheUnicodeStandard6.0,http://www.unicode.org/versions/Unicode6.0.0/(Accessedon12thDec.2017)
[100]DevanāgarīVIPTeam.“VariantIssuesReport”,ICANN,3rdOct.2011,https://archive.icann.org/en/topics/new-gtlds/devanagari-vip-issues-report-03oct11-en.pdf(Accessedon10thOct.2017)
[101]Omniglot,"Hindi",https://www.omniglot.com/writing/hindi.htm(Accessedon10thOct.2017)
[102]Omniglot,"Marathi",https://www.omniglot.com/writing/marathi.htm(Accessedon10thOct.2017)
[103]Omniglot,"Sanskrit",https://www.omniglot.com/writing/sanskrit.htm(Accessedon10thOct.2017)
[104]Omniglot,"Sindhi",https://www.omniglot.com/writing/sindhi.htm(Accessedon10thOct.2017)
[105]Omniglot,"Kashmiri",https://www.omniglot.com/writing/kashmiri.htm(Accessedon10thOct.2017)
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
48
[106]Unicode10.0.0,"SouthandCentralAsia-I-OfficialScriptsofIndia”,Page456(R5andR5a)",http://www.unicode.org/versions/Unicode10.0.0/ch12.pdf(Accessedon13thNov.2017)
[107]UnicodeIndicGroup,"DevanagariEyelashRa",http://unicode.org/~emuller/iwg/p8/utcdoc.html(Accessedon13thNov.2017)
[108]M.K.Raina,"HowtoreadandwriteKashmiriinDevanagari?",http://www.koshur.org/pdf/Let%20Us%20Learn%20Kashmiri.pdf(Accessedon12thDec.2017)
[109]CentralHindiDirectorate-MinistryofHRD-Govt.ofIndia,"DevanāgarīAlphabetanditsRomanization",http://hindinideshalaya.nic.in/english/hindi_orgin/devnagarithesysmbols.html(Accessedon12thDec.2017
[110]Omniglot,"Bodo",https://www.omniglot.com/writing/bodo.htm(Accessedon12thDec.2017)
[111]Omniglot,"Maithili",https://www.omniglot.com/writing/maithili.htm(Accessedon12thDec.2017)
[112]Omniglot,"Konkani",https://www.omniglot.com/writing/konkani.htm(Accessedon20thMay.2018)
[113]Omniglot,"Nepali",https://www.omniglot.com/writing/nepali.htm(Accessedon20thMay.2018)
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
49
10 Books,articlesandwebographiesconsulted
Followingisathematicallysortedsetofdocuments,books,articlesandwebographies
consultedinthedraftingofthisreport
10.1 WRITINGSYSTEMS1. Dillinger.D.,TheAlphabet.AKeytotheHistoryofMankind.3rdEditionin2
Volumes.Hutchison.London.1968.
10.2 DEVANĀGARĪ1. Agrawala,V.S.(1966).TheDevanāgarīscript.In:IndianSystemsofWriting.(Pp.12-
16)Delhi:PublicationsDivision.
2. Agyeya,SacchindanandHiranandVatsyayan.1972.Bhavanti.Delhi:RajpalandSons.
3. Beames,John.1872-79.AComparativeGrammaroftheModernAryanLanguagesof
India.3vols.London,TrubnerandCo.[ReprintedbyMunshiramManoharlal,New
Delhi,1966.]
4. Bhatia,TejK.1987.AHistoryoftheHindiGrammaticalTradition:Hindi-Hindustani
Grammar,Grammarians,HistoryandProblems.Leiden/NewYork:E.J.Brill.
5. Bright,W.(1996).TheDevanāgarīscript.InP.DanielsandW.Bright(eds),The
World’sWritingSystems.(Pp.384-390).NewYork:OxfordUniversityPress.
6. Cardona,George.1987.Sanskrit.InTheWorld'sMajorLanguages.BernardComrie
(ed.).London:CroomHelm.448-469.
7. Dwivedi,RamAwadh.1966.ACriticalSurveyofHindiLiterature.Delhi:Motilal
Banarsidass.
8. Faruqi,ShamsurRahman.2001.EarlyUrduLiteraryCultureandHistory.Delhi:
OxfordUniversityPress.
9. Guru,KamtaPrasad.1919.HindiVyakaran.Varanasi:NagariPrachariniSabha.
(1962edition).
10. Kachru,Yamuna.1965.ATransformationalTreatmentofHindiVerbalSyntax.
London:UniversityofLondonPh.D.dissertation(Mimeographed).
11. Kachru,Yamuna.1966.AnIntroductiontoHindiSyntax.Urbana:Universityof
Illinois,DepartmentofLinguistics.
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
50
12. KalyanKaleandAnjaliSoman,1986.LearningMarathi.ShriVishakhaPrakashan,
Pune:
13. McGregor,R.S.(1977).OutlineofHindiGrammar.2nded.Delhi:OxfordUniversity
Press.
14. McGregor,R.S.1972.OutlineofHindiGrammarwithExercises.Delhi:Oxford
UniversityPress.
15. McGregor,R.S.1974.HindiLiteratureoftheNineteenthandEarlyTwentieth
Centuries.Wiesbaden:Harrassowitz.
16. McGregor,R.S.1984.HindiLiteraturefromItsBeginningstotheNineteenth
Century.Wiesbaden:Harrassowitz.
17. Pandey,P.K.(2007).Phonology-orthographyinterfaceinDevanāgarīforHindi.
WrittenLanguageandLiteracy,10(2):139-156.2007.
18. Rai,Amrit.1984.AHouseDivided.TheOriginandDevelopmentofHindi/Hindavi.
Delhi:OxfordUniversityPress.
19. Sharad,Onkar.1969.LohiyakeVicar.Allahabad:LokbharatiPrakashan.
20. Singh,A.K.(2007).ProgressofmodificationofBrāhmīalphabetasrevealedbythe
inscriptionsofsixth-eighthcenturies.InP.G.Patel,P.PandeyandD.Rajgor(eds),
TheIndicScripts:PaleographicandLinguisticPerspectives.(Pp.85-107).New
Delhi:DKPrintworld.
21. Sproat,R.(2000).AComputationalTheoryofWritingSystems.Cambridge
UniversityPress.
22. Tiwari,PanditUdaynarayan.1961.HindiBhashakaUdgamaurVikas[TheOrigin
andDevelopmentoftheHindiLanguage].Prayag:LeaderPress.
23. Verma,M.K.1971.TheStructureoftheNounPhraseinEnglishandHindi.Delhi:
MotilalBanarsidass.
10.3 INDICCOMPUTINGSPECIFIC1. IS10401:8-bitcodeforinformationinterchange.1982
2. IS10315:7-bitcodedcharactersetforinformationinterchange.1985
3. IS12326:7-bitand8-bitcodedcharactersets-Codeextensiontechniques.1987
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
51
4. ISO15919,Informationanddocumentation-TransliterationofDevanāgarīand
relatedIndicscriptsintoLatincharacters.2001
5. ISO2375:Procedureforregistrationofescapesequences.2003
6. ISO8859:8-bitsingle-bytecodedgraphiccharactersets-Parts1-13.1998-2001
7. IDNPOLICYhttp://meity.gov.in/writereaddata/files/India-IDN-Policy.pdf
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
52
11 AppendixA:Visuallyconfusablecharacters/sequencesTheTable 20 below shows characters / character sequenceswhichmay appear visually
confusingtosomeoftheusersoftheDevanagariscript.However,theyarenotconsideredconfusingenoughtobecategorizedasvariants.
Confusable1 Confusable2
कU+0915
क़U+0915U+093C
खU+0916
ख़ U+0916U+093C
गU+0917
ग़U+0917U+093C
चU+091A
rU+091AU+093C
छU+091B
sU+091BU+093C
जU+091C
ज़U+091CU+093C
डU+0921
ड़U+0921U+093C
ढU+0922
ढ़U+0922U+093C
फU+092B
फ़U+092BU+093C
Table 20: Visually confusables
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
53
12 AppendixB:Cross-scriptConfusablesThe Devanagari script has a major set of possible cross-script confusables with the
Gurmukhiscript.TheTable21liststhem.
InadditiontoGurmukhi,someinstancesofcross-scriptconfusablearefoundwithBengali,
Gujarati,Telugu,Kannada,MalayalamandSinhala.
None of the combinations listed in Table 21 are considered equivalents of each other,
whether semantically or otherwise. They are only grouped based on possible visualconfusability.
Atfirst,theymaynotlookexactlythesame,however,inthegivencontexte.g.inabrowser
barasapartofadomainname,orasasinglewordwherethereisnosurroundingtextfromthesamescriptfordistinguishing,theycancreatevisualconfusion.
A label canbe considered tohavea cross-scriptvariant labelonly if "all" the constituent
characters/aksharashaveanequivalentconfusableintheotherscript.Ifthereisevenonesingle character/akshara which does not have an equivalent visual confusable in other
script,itessentiallyprovidesavisualdistinctionandhenceanon-confusablestring.
Devanagariconfusable Otherscriptconfusable Fromscript
◌ः
U+0903
◌ઃU+0A83 Gujarati
◌ः
U+0903
◌ః
U+0C03Telugu
◌ः
U+0903
◌ಃ
U+0C83Kannada
◌ः
U+0903
◌ഃU+0D03
Malayalam
◌ः
U+0903
ඃU+0A28 Sinhala
Proposal for a Devanagari Root Zone LGR Neo-Brahmi Generation Panel
54
उ
U+0909
ওU+0993
Bengali
घ
U+0918
ঘU+0998
Bengali
ठ
U+0920
ਨU+0A28
Gurmukhi
ठ
U+0920
ਰU+0A30
Gurmukhi
ड
U+0921
ਡU+0A21
Gurmukhi
ड
U+0921
ਤU+0A24
Gurmukhi
ढU+0922
ਢU+0A22
Gurmukhi
त
U+0924
ਜU+0A1C
Gurmukhi
य
U+092F
ਧU+0A27
Gurmukhi
◌ॅ
U+0945
◌ঁU+0981
Bengali
Table 21: Devanagari Cross-script confusables