Sanskrit Parser Report

SANSKRIT LANGUAGE PARSER

Akash Bhargava - 10UCS002

Ashok Kumar - 10UCS010

Laxmi Kant Yadav - 10UCS027

Vijay Kumar Gupta - 10UCS057

COMPUTER SCIENCE & ENGINEERING DEPARTMENTNATIONAL INSTITUTE OF TECHNOLOGY, AGARTALA

INDIA-799055MAY, 2014

SANSKRIT LANGUAGE PARSER

Dissertation submitted toNational Institute of Technology, Agartala

for the award of the degreeof

Bachelor of Technology

byAkash Bhargava - 10UCS002

Ashok Kumar - 10UCS010

Laxmi Kant Yadav - 10UCS027

Vijay Kumar Gupta - 10UCS057

Under the Guidance of

Mr. Nikhil DebbarmaAssistant Professor, CSE Department, NIT Agartala, India

COMPUTER SCIENCE & ENGINEERING DEPARTMENTNATIONAL INSTITUTE OF TECHNOLOGY AGARTALA

MAY, 2014

DISSERTATION APPROVAL SHEET

This dissertation entitled “Language Parser”, by Akash Bhargava, Enrolment Number 10UCS002;Ashok Kumar, Enrollment Number 10UCS010; Laxmi Kant Yadav, Enrollment Number 10UCS027;Vijay Kumar Gupta, Enrollment Number 10UCS057 is approved for the award of Bachelor ofTechnology in Computer Science & Engineering.

Nikhil Debbarma

Dissertation Supervisor

Assistant Professor

Computer Science & Engineering Department

NIT, Agartala

Paritosh Bhattacharya

Head Of Department

Professor


NIT, Agartala

Date:19.05.2014Place:NIT, Agartala

iii

DECLARATION

We declare that the work presented in this dissertation titled “Language Parser”,submitted to the Computer Science & Engineering Department, National Instituteof Technology, Agartala, for the award of the Bachelor of Technology degreein Computer Science & Engineering, represents my ideas in my own words andwhere others’ ideas or words have been included, We have adequately cited andreferenced the original sources. We also declare that we have adhered to all prin-ciples of academic honesty and integrity and have not misrepresented or fabricatedor falsified any idea/data/fact/source in my submission. We understand that any vi-olation of the above will be cause for disciplinary action by the Institute and canalso evoke penal action from the sources which have thus not been properly citedor from whom proper permission has not been taken when needed.

MAY, 2014Agartala

Akash Bhargava

10UCS002

Ashok Kumar

10UCS010

Laxmi Kant Yadav

10UCS027

Vijay Kumar Gupta

10UCS057

iv

CERTIFICATE

This dissertation entitled “Language Parser”, by Akash Bhargava, Enrolment Number 10UCS002;Ashok Kumar, Enrollment Number 10UCS010; Laxmi Kant Yadav, Enrollment Number 10UCS027;Vijay Kumar Gupta, Enrollment Number 10UCS057 is approved for the award of Bachelor ofTechnology in Computer Science & Engineering.

Nikhil Debbarma

Dissertation Supervisor

Assistant Professor


NIT, Agartala

Suman Deb

Coordinator

Assistant Professor


NIT, Agartala

v

Acknowledgement

We would like to take this opportunity to express our deep sense of gratitude to all who helpedus directly or indirectly during this project work. Firstly, we would like to thank out super-visor Asst. Prof. Nikhil Debbarma and Co-ordinator Asst. Prof. Suman Deb for being agreat mentor and the best advisor we could ever have.His advice, encouragement and criticsare source of innovative ideas, inspiration and causes behind the successful completion of thisproject. The confidence shown on us by him was the biggest source of inspiration for us. It hasbeen privilege working with them for last two semesters on two different projects.

We are highly obliged to all the faculty member of Computer Science and Engineering Depart-ment for their support and encouragement. We also thank out Director Dr. Gopal Mugerayaand HOD CSE Dept. Asst. Prof. Paritosh Bhattacharya for providing excellent computingand other facilities without which this work could not achieve its quality goal.

We would like to express our sincere appreciation and gratitude towards Asst. Prof. AnupamJamatia for his support to prepare this project report in LATEX. Finally we are grateful to outparents for their support. It was impossible for us to complete this project without their love,blessing and encouragement.

-Akash Bhargava, Ashok Kumar, Laxmi Kant Yadav, Vijay Kumar Gupta

vi

Dedicated to

To our loving families for their kind love and support.

To our Project Supervisor Asst. Prof. Nikhil Debbarma and our Project CoordinatorAsst. Prof. Suman Deb for sharing valuable knowledge, encouragement showing

confidence on us all the time.

vii

Abstract

Parsing or syntactic analysis is the process of analysing a string of symbols, either in naturallanguage or in computer languages, according to the rules of a formal grammar. The termparsing comes from Latin pars (orationis), meaning part of speech.Traditional sentence parsingis often performed as a method of understanding the exact meaning of a sentence, sometimeswith the aid of devices such as sentence diagrams. It usually emphasizes the importance ofgrammatical divisions such as subject and predicate.

According to many researchers, Sanskrit is a very scientific language. Sanskrit behavesvery closely as programming language. So if we are able to make a translator that translatesSanskrit into other language, then it would prove to be a significant development in the field ofNLP(Natural Language Processing).

In this project we will basically try to parse a Sanskrit sentence so that later on it could beeasy to translate it in some other language. We take input as a Sanskrit sentence or paragraph.We tokenize the whole sentence(Lexical analysis). We recognize the parts of the speech fromindividual tokens(Parsing) and then we parse the sentence or try to make sense out of it(Parsing)

viii

Contents

Acknowledgement vi

Dedicated to vii

Abstract viii

1 Introduction 3

1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6 About The Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

ix

1.7 Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.8 Study of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 System Requirement Specification 7

2.1 Compiler Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Lexical Analysis Phase : . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.2 Semantic Analysis Phase : . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.3 Intermediate Code Generation: . . . . . . . . . . . . . . . . . . . . . 9

2.1.4 Code Optimization : . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.5 Code Generation : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Parsing Methods : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Grammar : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 System Design 19

3.1 Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Input Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Input Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4 Input Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.6 Output Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Implementation & Screen shots 24

x

4.1 Parser :- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.1 Parsing Methods : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.2 Ambiguity : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Implementation Steps :- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.1 The Lexer : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.2 The Parser : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2.3 Grammer Used : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.4 Uses Of A Grammar : . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Input & Output : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Testing 35

5.1 Syntax Error Handling: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2 Error-Recovery Strategies : . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2.1 Panic mode: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.2.2 Phrase-level recovery: . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2.3 Error productions : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2.4 Global correction : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Conclusion 38

7 Appendix 40

8 Reference 42

xi

List of Figures

2.1 Phase of Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Lexical Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Parsing Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4 Vibhakti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Conjugational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Noun and Adjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.7 Noun Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.8 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.9 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.10 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Spiral Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20


x

CSED, NIT Agartala


4.1 lexical Analysis Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Output Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 Output SnapShot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1

CSED, NIT Agartala

2

Chapter 1

Introduction

1.1 Purpose

In this project we will basically try to parse a Sanskrit sentence so that later on it could be easyto translate it in some other language.

1.2 Scope

Ability to parse From Sanskrit sentence to English Sentence.

1.3 Basis

We will first put up some concepts then employ them-

3

CSED, NIT Agartala

• Lexical Analysis

• Parsing

• Advantages of using Sanskrit

• Approach

1.4 Overview

This Design Document is divided into five major Section.Section 1 is an Introduction that provides information about the document itself.Section 2 is an overview of the application and its primary functionality.Section 3 identifies the assumptions and constraints followed during the design the software.Section 4 documents the over system architecture.Section 5 provides the detailed design information for every subsystem and component in thecurrent delivery

1.5 Objective

In this project we will basically try to parse a Sanskrit sentence so that later on it could beeasy to translate it in some other language. Here we are describing about Machine TranslationTechnique for translating Sanskrit sentence to English sentence.

1.6 About The Project

• Machine Translation has been defined as the process that utilizes computer software totranslate text from one natural language to another, It is one of the most important appli-cations of Natural Language Processing.

• It helps people from different places to understand an unknown language without the aidof a human translator.

4

CSED, NIT Agartala

• The language to be translated is the Source Language (SL). The language, to which sourcelanguage translated is Target Language (TL).

• The major machine translation technique are Rule Based Machine Translation Technique,Statistical Machine Translation Technique (SMT) and Example-based machine transla-tion (EBMT).

• One of the effective techniques for machine translation is Rule Based Machine Transla-tion.

• In India, different machine translation systems are implemented. AnglaUrdu (AnglaHindibased) Machine Translation System for English to Urdu , HindiAngla Machine Trans-lation Systems form Hindi to English, English-Assarnese Machine Translation System(Machine Translation System from English to Assamese, MaTra: Human Aided MachineTranslation System, AnglaHindi: An English to Hindi Machine-Aided Translation Sys-tem and AnglaBharti Technology for machine aided translation from English to IndianLanguages, these are some of the machine translation works implemented in India.

• Machine translation from Sanskrit is never an easy task because of structural vastness ofits Grammar, but the grammar is well organized and least ambiguous compared to othernatural language.

• The Sanskrit sentence which is the input for our first module i.e. lexical Parser it generatesa Parse tree that is generated by using semantic relationships.

• This parse tree acts as an input to the Second module i.e. Semantic mapper where theSanskrit semantic word is mapped to the English semantic word.

1.7 Drawbacks

Some of the most fluent drawbacks of the project:

• This project is all about Parsing a language into another , it is not a pure translator.

• This project is platform dependent (here platform is Linux).

• It is Database oriented project not just using online approach.

5

CSED, NIT Agartala

1.8 Study of the Project

To Provide the facility for users to give input in sanskrit language and converting (parsing ) itinto English language. Here we have some predefined methods for Parsing As:

• We first tokenize the input using strtok(str,’́ ’́);

• Each token can be of 3 types- Noun,verb, preposition.The task is to identify these tokenwhich is done by matching in indexed database.

• Each token is stored in a structure along with the meaning and its morphologic.

• Then parser comes into play and form a tree type of structure. Using these tokens.

Major approaches of Machine Translation are rule-based machine translation (RBMT, alsoknown as the Rational approach). Rule based translation consists of:1. Process of analyzing input sentence of a source language syntactically and or semantically2. Process of generating output sentence of a target language based on internal structure eachprocess is controlled by the dictionary and the rules.

• The strength of the rule based method is that the information can be obtained throughintrospection and analysis.

• The weakness of the rule based method is the accuracy of entire process is the product ofthe accuracies of each sub stage.

6

Chapter 2

System Requirement Specification

2.1 Compiler Phases

Compiler operates in phases ans each phase transforms the source program from one represen-tation to another. Compiler has six phases :-

• Lexical Analyzer

• Syntax Analyzer

• Semantic Analyzer

• Intermediate code generation

• Code optimization

• Code Generation

7

CSED, NIT Agartala

Symbol table and error handling interact with the six phases. Some of the phases may begrouped together.

.

Figure 2.1: Phase of Compiler

2.1.1 Lexical Analysis Phase :

The lexical phase reads the characters in the source program and groups them into a streamof tokens in which each token represents a logically cohesive sequence of characters, such as,An identifier, A keyword, A punctuation character. The character sequence forming a token iscalled the lexeme for the token. The semantic standard representation was designed to provide asimple description of the grammatical relationships in a sentence that can easily be understoodand effectively used by people without linguistic expertise who want to extract textual relations.The sentence relationships are represented uniformly as semantic standard relations betweenpairs of words.

8

CSED, NIT Agartala

.

Figure 2.2: Lexical Analyzer

2.1.2 Semantic Analysis Phase :

This phase checks the source program for semantic errors and gathers type information for thesubsequent code-generation phase. It uses the hierarchical structure determined by the syntax-analysis phase to identify the operators and operands of expressions and statements. An impor-tant component of semantic analysis is type checking.

2.1.3 Intermediate Code Generation:

The syntax and semantic analysis generate a explicit intermediate representation of the sourceprogram. The intermediate representation should have two important properties:

• It should be easy to produce.

• Easy to translate into target program.

Intermediate representation can have a variety of forms. One of the forms is: three addresscode; which is like the assembly language for a machine in which every location can act like a

9

CSED, NIT Agartala

register. Three address code consists of a sequence of instructions, each of which has at mostthree operands

2.1.4 Code Optimization :

Code optimization phase attempts to improve the intermediate code, so that faster-running ma-chine code will result.

2.1.5 Code Generation :

The final phase of the compiler is the generation of target code, consisting normally of relocat-able machine code or assembly code. Memory locations are selected for each of the variablesused by the program. Then, the each intermediate instruction is translated into a sequence ofmachine instructions that perform the same task.

2.2 Parsing Methods :

In the compiler model, the parser obtains a string of tokens from the lexical analyser, and verifiesthat the string can be generated by the grammar for the source language. The parser returns anysyntax error for the source language. There are two types of parsing methods: top-down andbottom-up. ”Top-down” is pretty much self-explanatory. From left to right, we drill downthrough each non-terminal until we get to a terminal. We also build our tree from the root nodedown to the leaves in a top-down fashion. It’s important to note that we drill down from leftto right replacing the leftmost non-terminal first. The definitive meaning of top-down parsingis an attempt to find a leftmost derivation. ” In bottom-up parsing we are doing a rightmostderivation, where we replace the rightmost non-terminal first.

There are three general types parsers for grammars.Universal parsing methods such astheCocke-Younger-Kasami algorithmand Earleys algorithmcan parse any grammar. These meth-ods are too inefficient to use in production compilers. The methods commonly used in compilersare classified as either top-down parsingorbottom-up parsing. Top-down parsers build parsetrees from thetop (root)to the bottom (leaves) Bottom-up parsers build parse trees from the

10

CSED, NIT Agartala

.

Figure 2.3: Parsing Step

leaves and work up to the root. In both case input to the parser is scanned from left to right,one symbol at a time. The output of the parser is some representation of the parse tree for thestream of tokens. There are number of tasks that might be conducted during parsing. Such as

• Collecting information about various tokens into the symbol table.

• Performing type checking and other kinds of semantic analysis.

• Generating intermediate code.

11

CSED, NIT Agartala

Algorithm for Parsing an English sentence

1. Tokenize the sentence into various tokens i.e. token list.

2. To find the relationship between tokens we are using dependency grammar and binaryrelation for our Sanskrit language. Token list acts as an input to semantic class to representthe semantic standard.

3. Semantic class generates a tree we have a class Tree Transform which will create a tree.

4. Semantic class generates a tree we have a class Tree Transform which will create a tree.

2.3 Grammar :

Grammar provides a precise way to specify the syntax (structure or arrangement of composingunits) of a language. In grade school we take grammar lessons that teach us to speak and writeproper English. They teach us the correct way to form sentences with subjects, predicates,noun phrases, verb phrases, etc. Subjects, predicates, and phrases are some of the composingunits of a sentence in English; similarly, if/else statements, assignment statements, and functiondefinitions are some of the composing units of source code, which itself is a single sentence ofa particular programming language. There are a very large number of valid English sentencesone could compose; likewise, there are a large (probably infinite) number of valid source codeprograms one could create. If someone says ”on the computer she is,” we immediately recognizethat the sentence is ill- formed. It’s structure is invalid, because the noun phrase should proceedthe verb phrase. It should be: ”She is on the computer .If we take a look at that diagrammingarticle, well see that the model is exactly like an AST. So it goes without saying that parsing, ormore formally, syntactical analysis,” has its roots in Linguistics. Moreover, just as in English,programming languages need to be specified in a way that allows us to verify whether a sentenceof the language is valid. That’s where context-free grammars (CFG) come to into play; theyallow us to specify the syntax of a programming language’s source code.

12

CSED, NIT Agartala

Vibhakti as Pointer

.

Figure 2.4: Vibhakti

13

CSED, NIT Agartala

Basic conjugational endings :

Figure 2.5: Conjugational

.

14

CSED, NIT Agartala

Basic noun and adjective declension

.

Figure 2.6: Noun and Adjective

15

CSED, NIT Agartala

A-stems (noun words ending with a)

Figure 2.7: Noun Word

.

16

CSED, NIT Agartala

i- and u-stems

.

Figure 2.8: Noun

.

Figure 2.9: Noun

17

CSED, NIT Agartala

Sanskrit verbs There are 10 types of verb declension forms. One example of bhava rootword is given here. (Only present, past, future).

.

Figure 2.10: Noun

2.4 Makefile

GNU make utility to maintain groups of programs.The purpose of the make utility is to de-termine automatically which pieces of a large program need to be recompiled, and issue thecommands to recompile them.To prepare to use make, you must write a file called the make-file that describes the relationships among files in your program, and the states the commandsfor updating each file. In a program, typically the executable file is updated from object files,which are in turn made by compiling source files . Once a suitable makefile exits, each time youchange some source files. make command will process the file called makefile. In that case, weshould use -f option if you want make command processes Makefile.make clean:- ”make clean” deletes any files generated by previous attempts, leaving you withclean source code

18

Chapter 3

System Design

3.1 Spiral Model

The spiral model of software development is show diagrammatic representation of this modelappears like a spiral with many loops. The exact number of loop in the spiral is not fixed eachloop of the spiral represents a phase of the software process. This model is much more flex-ible than other model,since the exact no of phase of the phases through which the product isdeveloped is not fixed. Each phase in this model is split into four sectors as shown in figure.The first quadrant identifies the objectives of the phase and the alternative solution is possiblefor the phase under consideration. During second phase, the alternative solutions are evaluatethe best solutions possible. The spiral model provides direct support for coping with projectrisks.Activities during the fourth quadrant concern reviewing the result of the stages traversedso far with the customer and planning the next iteration around the spiral. This is viewed asmeta model,since it subsumes all the discussed model. The spiral mode; uses a prototyping ap-proach by first building a prototype before embarking in the actual product development effort.Also, the spiral model can be considered as supporting the evolutionary model-the iterations

19

CSED, NIT Agartala

Figure 3.1: Spiral Model

along the spiral can be considered as evolutionary model levels through which the completesystem is built. This enables the developer to understand and resolve the risks at each evolu-tionary level.the spiral model uses prototyping as a risk reduction mechanism and also returnthe systematic step-wise approach of the waterfall model.

3.2 Input Stages

The main input stages can be listed as below:

• Data supply

• Data transaction

• Data synchronization

• Data verification

• Data validation

• Data correction

20

CSED, NIT Agartala

3.3 Input Types

It is necessary to determine the various types of inputs.Inputs can be categorized as follows:

• External inputs,which are prime inputs for the system.

• Internal inputs,which are user communications with the system.

• which are inputs entered during a dialogue.

3.4 Input Media

At this stage choice has to be made about the input media. To conclude about the input mediaconsideration has to be given to:

• Type of input

• Flexibility of format

• Speed

• Accuracy

• Easy of correction

• Easy to use

• Portability

3.5 Data Flow Diagram

21

CSED, NIT Agartala

Figure 3.2: Data Flow Diagram

Figure 3.3: Data Flow Diagram

22

CSED, NIT Agartala

3.6 Output Design

Outputs from computer systems are required primarily to communicate the results of processingto users.They are also used to provide a permanent copy of the results for later consultation.Thevarious types of outputs are:

• External Outputs,whose destination is in the file named Temp.

• Internal outputs whose destination is with in organization and they are the Users maininterface with the Linux system.

• Operational outputs whose use is purely with in the android mobile department.

• Interface outputs,which involve the user in communicating directly with the system.

23

Chapter 4

Implementation & Screen shots

We will be finding trend in programming languages which are moving faster from machine levelto high level to human level languages. See how it is moving from assembly¿c¿c++¿Java¿rubyAnd this will not stop until they create something entirely humanly. The scope of Sanskrit tobecome a computer language lies in library system. When you compile a code in C, it patchesyour code with some predefined libraries. E.g. if you do strcmp(string1,string2) is the best wayto do it because it will link library code in your executable. Libraries are written in assemblylanguage and highly optimized. So if you have all libraries with you, why you need C? Whycant just say GO AND OPEN THE DOOR and expect computer to understand it and do itin highly optimized way. Onus lies with intelligent interpreter. Sanskrit is language whereletters have meanings. It does not need to be words for them to transmit emotions/information.Composition of letters to words, again changes their meaning. Yes, something like OOPS. E.g.ANU is particle and PARMANU is nanoparticle. To be a programming language Consistencyis needed which is there in Sanskrit. Ill explorer more in future how Sanskrit can be adjustedto be a human computer language.Sanskrit is not descriptive language. You dont need to writeparagraphs to explain. When you translate something to Sanskrit, its size will reduce. It isprecise, crisp and clear.

24

CSED, NIT Agartala

4.1 Parser :-

Parsing is the de-linearization of linguistic input; that is, the use of grammatical rules andother knowledge sources to determine the functions of words in the input sentence. Getting anefficient and unambiguous parse of natural languages has been a subject of wide interest in thefield of artificial intelligence over past 50 years. A parser breaks data into smaller elements,according to a set of rules that describe its structure. Parsing is the process of analysing a text,made of a sequence of tokens (for example, words), to determine its grammatical structure withrespect to a given grammar.

Following are the Steps to generate a Parse Tree:-

1. : Input is a English sentence.

2. : Lexical Analyzer Creates Tokens.

3. : Tokens generated acts as an input to Semantic analyzer.

4. : Tokens generated acts as an input to Semantic analyzer.

5. : Output is a parse tree.

4.1.1 Parsing Methods :

There are two types of parsing methods: top-down and bottom-up. ”Top-down” is pretty muchself-explanatory. From left to right, we drill down through each non-terminal until we get to aterminal. We also build our tree from the root node down to the leaves in a top-down fashion.It’s important to note that we drill down from left to right replacing the leftmost non-terminalfirst. The definitive meaning of top-down parsing is an attempt to find a leftmost derivation.”In bottom-up parsing we are doing a rightmost derivation, where we replace the rightmost non-terminal first.

• Bottom-Up ParsingIn bottom-up parsing the derivation starts from the string of terminals (our sentence) .We try to derive the start symbol of our CFG. It’s essentially a top-down derivation back-wards. Initially, instead of replacing a non-terminal with another non-terminal or terminal

25

CSED, NIT Agartala

(drilling down), we replace a terminal with non-terminal (drilling up). At certain pointswe may even replace several non-terminals with one non-terminal. Since the derivationis the exact reverse of a leftmost derivation, we are then replacing non-terminals fromright to left (a rightmost derivation). When we make a replacement we create a node thatbecomes the parent of some other node instead of its child.

• Top-Down ParsingThere are several problems with top-down parsing.(1) Left-recursion can lead to infinite parsing loops, so it must be eliminated. Left re-cursion in a CFG production occurs when the non-terminal on the left side appears firston the right side of the arrow. There are simple algorithms to remove it, but the CFGbecomes twice as long in many cases.(2) Top-down parsing may involve backtracking. Backtracking is the act of climbing backup the derivation (the parse), reversing everything to try another derivation path. We endup re-scanning the input as well. If inserting information into a symbol table as the parseproceeds, everything has to be removed. The need for backtracking can be eliminatedby parsing with lookahead. Backtracking isn’t restricted to top-down parsers. There arebacktracking LR parsers as well.Finally, (3) the order in which we choose non-terminal expansions can cause valid inputsto be rejected without information as to why.

4.1.2 Ambiguity :

Ambiguous grammars are those in which a string of the language has more than one parse tree.This is problematic because it may be hard to interpret the intended meaning of the string. x*y;That C statement can be interpreted as the multiplication of two variables, x and y, or as thedeclaration of a variable y whose type is a pointer to x. To resolve the conflict the compilermust locate y’s type information in the symbol table. If it’s a numerical type the statementis interpreted as an expression. Generally speaking, ambiguity is an unwanted feature of anygrammar and may pose a threat to the correctness of both top-down and bottom-up parsers.Different parsers handle it with varying efficacy. In spite of all this, ambiguity isn’t alwaysa problem. It’s possible to generate a non-ambiguous language from an ambiguous grammar.Even if there are two parse trees that generate a string, as long as it has one intended meaningthere’s no problem. Some parser generators allow specifying precedence and associativity rulesto remove any ambiguity.

26

CSED, NIT Agartala

4.2 Implementation Steps :-

The following steps used for developing this application:

4.2.1 The Lexer :

The first step towards creating a succesful Sanskrit English Parser(SEP) is to create alexer that analyses every word of the input sanskrit sentence.

Tokenizer:

The tokenizer divides the complete sentence in a stream of individual words seperated by blankspaces.

Avyaya Analyser :

Every single output of the tokenizer goes through the smallest database of avyaya words(indeclinables)and only if it produces a complete match, the word is accepted as an avyaya.

Verb Analyser :

The second relatively bigger database of verb roots(dhaturoops) is placed after the avyayadatabase. Tokens not recognized as avyaya are then processed by the verb analyser. The pro-gram verb.cpp analyses the suffix of every input token and generates information regardingtense, person and number of corresponding token. The suffix is then removed and the verb ismapped to its respective root using the verb databse. If a match is found the token is acceptedas a verb, else passed on for noun analysis.

27

CSED, NIT Agartala

Noun Analyser :

Tokens not yet recognized are fed to the noun analyser (noun.cpp). Noun declensions belongingto different genders have different pattern that can not be matched by the program. Hence of the21 possible noun declensions for 1 single noun, 10 declensions are stored as exceptions whileremaining 11 are processed by the program and the root word is obtained. Lastly if the wordis still not recognized than it is not present in the database and must be entered manually foranalysis.

Figure 4.1: lexical Analysis Steps

4.2.2 The Parser :

Equipped with the knowledge of what individual words represent we can now move towardsre-arranging them in such a way that their mere translation results in a meaningful Englishsentence. When parsing from Sanskrit to English we move from a word order free language toa language in which only a particular order of words would convey the same meaning.

28

CSED, NIT Agartala

How to represent CONTEXT ?

By CONTEXT we mean the parts of a statement that precede or follow a specific word or pas-sage, usually influencing its meaning or effect. Sanskrit uses the concept of vibhakti to generatecontext. Due to lack of vibhakti in English the user will have to understand the context of everyword with help from the LEXER. Using the lexer the user can add words like for, from, to, etc.which are not used in Sanskrit. Thus the PARSER gives us the spatial arrangement of inputwords in converted form (in English) and the LEXER is referred for context. This results inEnglish translation of a Sanskrit sentence.

Structure of an English sentence :

Every English sentence is a combination of nouns and verbs related to each other through con-text. In a SIMPLE sentence (sentence without connectors having only 1 verb), the verb is thecentral entity. Nouns then relate to this central entity via context, as defined-Nominative(S) the SUBJECT/doer of verbAccusative (O) the OBJECT of verbInstrumental (I) the cause/means of verbDative (D) the indirect object of verbAblative (A) represents comparison/separationLocative (L) represents position in space/timeThe LEXER already generates this contextual information for every noun, and the PARSER cannow arrange a simple input sentence spatially, following the rules of English as shown below.Thus, we have the following orderS V O L/A/D/IThe PARSER interprets LEXER’s outputs and rearranges various nouns at their respective po-sitions as shown. The user can now apply context of every noun used, to obtain a correspondingEnglish translation.

Parsing rules for a simple sentence :

The PARSER can handle all forms of noun declensions,verb declensions and avyayas(includingconnectors). Following points summarise the working of the parser -

29

CSED, NIT Agartala

.

Figure 4.2: Parsing

• The parser stores nouns, verbs and avyaya in 3 separate structures along with their re-spective information required by the parser like case context,number,person.

• The parser can handle words representing adjectives.

• The parser can handle words representing adverbs.

• The parser can resolve ambiguity generated by Sanskrit noun declensions. Ex. If an inputSanskrit sentence contains no nominative noun but there is a noun which can be bothnominative and accusative then it is treated as nominative.

• The parser requires that the subject and verb agree on number.thus, is correct but, isincorrect

• The parser also handles the GENETIVE case which represents a noun-noun relationshiprather than a noun-verb relationship as other declensions do.

• The parser handles avyayas which correspond to a given noun declension type.

• The parser handles avyayas representing questions.

• The parser handles avyayas that act as conjunctions of different types

• The parser can thus handle multiple sentences joined together using avyayas.

30

CSED, NIT Agartala

• The parser displays the interpreted spatial arrangement of the input sentence, in a text filenamed temp.

• The parser can process an input even if some part of it is not defined in the laxer database.Such unrecognized input tokens are outputed as it is, at the start of resultant sentence, inthe temp file.

4.2.3 Grammer Used :

Sanskrit uses a context free grammar. Also the BNF grammar for Sanskrit also exists. Thevarious forms of BNF grammar is given as:

<BNF rule> ::= <nonterminal > ”::=” <definitions ><nonterminal > ::=” <” <words > ”>”<terminal > ::= <word > | <punctuation mark > |’ ” ’ <any chars >’ ” ’<words > ::= <word >|<words ><word >

<word > ::= <letter >|<word ><letter >|<word ><digit ><definitions > ::= <definition >|<definitions >”|” <definition >

<definition > ::= <empty >|<term >|<definition ><term<empty > ::=<term > ::= <terminal >|<nonterminal >

4.2.4 Uses Of A Grammar :

A BNF grammar can be used in two ways :-

• To generate strings belonging to the grammar

• To do this, start with a string containing a non-terminal; while there are still non-terminalsin the string replace a non-terminal with one of its definitions.

• To recognize strings belonging to the grammar

• This is the way programs are compiled - a program is a string belonging to the grammarthat defines the language

31

CSED, NIT Agartala

• Recognition is much harder than generation

32

CSED, NIT Agartala

4.3 Input & Output :

.

Figure 4.3: Output Snapshot

33

CSED, NIT Agartala

.

Figure 4.4: Output SnapShot

34

Chapter 5

Testing

While developing this project we faced some discrepancy between the grammar definition andthe query classes implementation. In order to have a coherent implementation, we had to correctthem.

For the testing there are different strategies :-

5.1 Syntax Error Handling:

Planning the error handling right from the start can both simplify the structure of a compiler andimprove its response to errors. The program can contain errors at many different levels. e.g.

• Lexical such as misspelling an identifier, keyword, or operator.

• Syntax such as an arithmetic expression with unbalanced parenthesis.

• Semantic such as an operator applied to an incompatible operand.

35

CSED, NIT Agartala

• Logical such as an infinitely recursive call.

Much of the error detection and recovery in a compiler is centred on the syntax analysisphase. One reason for this is that many errors are syntactic in nature or are exposed when thestream of tokens coming from the lexical analyser disobeys the grammatical rules defining theprogramming language. Another is the precision of modern parsing methods; they can detectthe presence of syntactic errors in programs very efficiently.The error handler in a parser has simple goals:-

• It should the presence of errors clearly and accurately.

• It should recover from each error quickly enough to be able to detect subsequent errors.

• It should not significantly slow down the processing of correct programs.

5.2 Error-Recovery Strategies :

There are many different general strategies that a parser can employ to recover from a syntacticerror.

• Panic mode

• Phrase level

• Error production

• Global correction

5.2.1 Panic mode:

• This is used by most parsing methods.

• On discovering an error, the parser discards input symbols one at a time until one of adesignated set of synchronizing tokens ( delimiters; such as; semicolon or end ) is found.

36

CSED, NIT Agartala

• Panic mode correction often skips a considerable amount of input without checking it foradditional errors.

• It is simple.

5.2.2 Phrase-level recovery:

• On discovering an error; the parser may perform local correction on the remaining input;i.e., it may replace a prefix of the remaining input by some string that allows the parser tocontinue.

• Exmple, local correction would be to replace a comma by a semicolon, deleting an extra-neous semicolon, or insert a missing semicolon.

• Its major drawback is the difficulty it has in coping with situations in which the actualerror has occurred before the point of detection.

5.2.3 Error productions :

• If an error production is used by the parser, can generate appropriate error diagnostics toindicate the erroneous construct that has been recognized in the input.

5.2.4 Global correction :

• Given an incorrect input string x and grammar G, the algorithm will find a parse tree fora related string y, such that the number of insertions, deletions and changes of tokensrequired to transform x into y is as small as possible.

37

Chapter 6

Conclusion

The project is mainly based on Two languages C and C++. In this project we have Used Sanskritas an input language and English as an output language. Firstly Taking input Sanskrit fromKeyboard , Tokenize the sentence using Tokenizer , Identifying the tokens using Token Analyser, Then matching the Tokens from database and fetching the output words and finally Add allthe resulting words to produce the output . The main goal of the current study was to parse aSanskrit sentence so that later on it could be easy to translate it in some other language.

The findings from this study make several contributions to the current literature. First thatwe should use Sanskrit as the primary language for programming purpose .

Finally, a number of important limitations need to be considered. First This project is allabout Parsing a language into another , it is not a pure translator. Second This project is platformdependent (here platform is Linux) and third It is Database oriented project not just using on-line approach. It is recommended that further research be undertaken in the following areas:

• We can make this project more user friendly by using graphical user interface.

• We can apply this scheme on many different languages.

38

CSED, NIT Agartala

The findings of this study have a number of important implications for future practice.Thistranslator is mainly based on fetching of data from database

39

Chapter 7

Appendix

AAvyaya Analyser 37Ambiguous 15CCompiler 6Code Optimization 9Code Generation 9DDrawbacks 4Data Flow Diagram 20EError-Recovery Strategies 35Error productions 36GGrammar 11Grammer Used 30

40

CSED, NIT Agartala

Global correction 36IIntermediate Code Generation 8Input Stages 19Input Types 20LLexical Analysis Phase 7MMakefile 17OObjective 3Output Design 22PParsing Methods 9SSScope 2TTesting 34UUses Of A Grammar 30

41

Chapter 8

Reference

To our Project Supervisor Assistant Professor Nikhil Debbarma and our Project Coordina-tor Assistant Professor Suman Deb for sharing valuable knowledge, encouragement showingconfidence on us all the time and some link on internet.

• Sanskrit & Artificial Intelligence —NASAKnowledge Representation in Sanskrit and Artificial Intelligence by Rick Briggs Roacs,NASA Armes Research Centre, Moffet Field, California

• http://www.vedicsciences.net/articles/sanskrit-nasa.html

• AI Magazine publishes the importance of Sanskrit

• http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/466

• http://sanskrit.jnu.ac.in/morph/analyze.jsp

• http://uttishthabharata.wordpress.com/2011/05/30/sanskrit-programming/

42

Date post:	10-Jun-2015
Category:	Technology
Upload:	laxmi-kant-yadav
View:	1,178 times
Download:	7 times