03-60-214ComputerLanguages,Grammars,andTranslators
JianguoLuSchoolofComputerScience
UniversityofWindsor
17-01-10 2
Instructors
– ProfessorJianguoLu• Office:LambtonTower5111• Phone:519-253-3000ext3786• Email:jluatuwindsor• Web:hUp://cs.uwindsor.ca/~jlu/214
• GAs
17-01-10 3
CoursedescripZon• Computerlanguages,grammars,andtranslators• Prerequisite:60-100,03-60-212
– AssignmentswillbeimplementedinJava.• ObjecZve
– Knowledgeofcomputerlanguagesandgrammars– AbletoanalyzeprogramswriUeninvariouslanguages– Abletotranslatelanguages
• Contents– Regularexpressions,finiteautomataandlanguagerecognizers;– Contextfreegrammar;– Languagesparsers.
• Soawaretoolsused– Programminglanguage:Java(includingtokenizer,regularexpressionpackage)– Lexicalanalyzer:JLex,– Parsergenerator:JavaCup
17-01-10 4
WhatisLanguage
• Language:“anysystemofformalizedsymbols,signs,etc.,usedorconceivedasameansofcommunicaZon.”– Communicate:totransmitorexchangethoughtorknowledge.
• Programminglanguage:communicatebetweenapersonandamachine– Programminglanguageisanintermediary
thought Languages machine
03 60 214: Computer Languages, Grammars, and Translators
17-01-10 5
Hierarchyof(programming)languages
• Machinelanguage;• Assemblylanguage:mnemonicversionofmachinecode;• Highlevellanguage:Java,C#,Pascal;• Problemoriented;• Naturallanguage.
thought
Languages machine
Natural Language
High Level Language
Assembly Language
Machine Language
Problem Oriented Language
Closer to humans
Higher level
03 60 214: Computer Languages, Grammars, and Translators
17-01-10 6
Grammar
• Grammar:thesetofstructuralrulesthatgovernthecomposiZonofsentences,phrases,andwordsinanygivennaturallanguage.--wikipedia
• Formalgrammar:rulesforformingstringsinaformallanguages
• Computerlanguagegrammar:rulesforformingtokens,statements,andprograms.
• Differentlayersofgrammar:– Regulargrammar(forwords,tokens)– Contextfreegrammar(forsentences,programs)– …
03 60 214: Computer Languages, Grammars, and Translators
17-01-10 7
LanguageTranslators
• Translator:Translateonelanguageintoanotherlanguage(e.g.,fromC++toJava)– Agenericterm.
• Forhighlevelprogramminglanguages(suchasjava,C):– Compiler:translatehighlevelprogramminglanguagecodeintohost
machine’sassemblycodeandexecutethetranslatedprogramatrun-Zme.
– Interpreter:processthesourceprogramanddataatthesameZme.Noequivalentassemblycodeisgenerated.
• Assembler:translateanassemblylanguagetomachinecode.
03 60 214: Computer Languages, Grammars, and Translators
17-01-10 8
CompilerandInterpreter
• Compiler
• Interpreter
Source Code
Compile Execute Results Object Code
data
Interpret
data
Results Source Code
Compile time Execute time
Compile and run time
03 60 214: Computer Languages, Grammars, and Translators
17-01-10 9
Howdoesacompilerwork
• Acompilerperformsitstaskinthesamewayhowahumanapproachesthesameproblem
• Considerthefollowingsentence:“Writeatranslator”
• Weallunderstandwhatitmeans.Buthowdowearriveattheconclusion?
03 60 214: Computer Languages, Grammars, and Translators
17-01-10 10
Theprocessofunderstandingasentence
1. Recognizecharacters(alphabet,mathemaZcalsymbols,punctuaZons).– 16explicit(alphabets),2implicit(blanks)
2. GroupcharactersintologicalenZZes(words).– 3words.– Lexicalanalysis
3. Checkthewordsformastructurallycorrectsentence– “translatorawrite”isnotacorrectsentence– SyntacZcanalysis
4. CheckthatthecombinaZonofwordsmakesense– “digatranslator”isasyntacZcallycorrectsentence– SemanZcanalysis
5. Planwhatyouhavetodotoaccomplishthetask– CodegeneraZon
6. Executeit.
“Writeatranslator”
03 60 214: Computer Languages, Grammars, and Translators
17-01-10 11
Thestructure(phases)ofacompiler
syntax analyzer
Source code
error handler
Lexical analyzer
improve code
symbol table
generate code
object code
Synthesis Synthesis Analysis
• Frontend(analysis):dependonsourcelanguage,independentonmachine– Thisiswhatwewillfocus(mainlytheblueparts).
• Backend(synthesis):dependentonmachineandintermediatecode,independentofsourcecode.
03 60 214: Computer Languages, Grammars, and Translators
semantic analyzer
17-01-10 13
Assignmentsoverview
• Ourfocusisthefrontend– AutomatedgeneraZonoflexicalanalyzer– AutomatedgeneraZonofsyntaxanalyzer
syntax Analyzer
Assignment 3
Source code
Lexical Analyzer
Assignment 2
translation Assignment 4
17-01-10 14
Assignments(28%)
• Assignment1(warmup):RegularexpressioninJava(5%)– UseStringTokenizerinJDKtotokenizethestrings.– Useregularexpressionstomatchstrings– Youwillseethedifficultytoanalyseprogramswithoutadvancedtools
suchasJlexandJavaCup.• Assignment2(6%)
– UseJLextobuildalexicalanalyzerforZnyprogram• Assignment3(6%)
– Manuallywritearecursivedescendentparsing– UseJavaCuptogenerateaparser
• Assignment4(6%)– TranslatetheZnyprogramtoJavaandactuallyrunit.
• Assignment5(5%)– Manuallywritearecursivedescendentparsing
17-01-10 15
Whythiscourse
• Everyuniversityoffersthistypeofcourses.• Skillslearnt
– writeaparser– processprograms– re-engineerandmigrateprograms
• MigratefromC++toC#• …
– processdata• Xml,weblogs,socialnetworks,…
17-01-10 16
Whythiscourse(cont.)
• TheoreZcalaspectsofprogramming• Thescienceofdevelopingalargeprogram
– Nothandcraatheprogram
• Howto– definewhetheraprogramisvalid– Determinewhetheraprogramisvalid– Generatetheprogram
17-01-10 17
Coursematerials• Referencebooks(notrequired)
– Compilers:Principles,Techniques,andTools(2ndEdiZon)byAlfredV.Aho,MonicaS.Lam,RaviSethi,andJeffreyD.Ullman(Aug31,2006)
– OrA.V.Aho,R.Sethi,andJ.D.Ullman,Compilers:Principles,Techniques,andTools,Addison-Wesley,1988.(Chapter1-5)
– JohnR.Levine,TonyMason,andDougBrown,Lex&Yacc,O'Reilly&Associates,1992.
• Onlinemanual– JavaCup,www.cs.princeton.edu/~appel/modern/
java/CUP/– JLex,www.cs.princeton.edu/~appel/modern/java/
JLex/
17-01-10 18
Markingscheme
Exams 72% Midterm1 12%
Midterm2 20%
Final 40%
Assignments 28% assignment1 5%assignment2 6%assignment3 6%assignment4 6%
Assignment5
5%
Total 100% 100%
17-01-10 19
Assignments(28%)• Assignmentsubmission• Allassignmentsmustbecompletedindividually.
– AlltheassignmentswillbecheckedbyacopyingdetecZonsystem.
• Academicdishonesty– Discussionwithotherstudentsmustbelimitedtogeneraldiscussionof
theproblem,andmustneverinvolveexamininganotherstudent'ssourcecodeorrevealingyoursourcecodetoanotherstudent.
17-01-10 20
Exams(72%)• Twomidtermexams• Finalexam• Closebookexams• Examscovertopicsinlectures
– ClassaUendanceisimportant.
• Examswillcovertopicsinassignments– Finishingassignmentsisalsoimportant.
• Whatifyoumissedexam(s)– Amissedexamwillresultinamarkofzero.Theonlyvalidexcusefor
missinganexamisadocumentedmedicalemergency.
17-01-10 21
StudentMedicalCerAficate[1]
FacultyofSCIENCE
A. TOBECOMPLETEDBYTHESTUDENT:I,____________________________,herebyauthorizeDr.______________________________toprovidethefollowinginformaZontotheUniversityofWindsor
and,ifrequired,tosupplyaddiZonalinformaZontosupportmyrequestforspecialacademicconsideraZonformedicalreasons.MypersonalinformaZonisbeingcollectedundertheauthorityoftheUniversityofWindsorAct1962andwillbeusedforadministraZveandacademicrecord-keeping,academicintegritypurposes,andtheprovisionofservicestostudents.ForquesZonsinconnecZonwiththecollecZonofthisinformaZon,theAssociateDeanofmyFacultymaybecontactedat519-253-3000.
______________________________________________________________________Signature StudentNo. Date
B. TOBECOMPLETEDBYTHEPHYSICIAN:1. IherebycerZfythatIprovidedhealthcareservicestotheabove-namedstudenton
_________________________________________.(insertdate(s)studentseeninyouroffice/clinic)2. ThestudentcouldnotreasonablybeexpectedtocompleteacademicresponsibiliZesforthefollowingreason(inbroadterms):____________________________________________________________________________3. Thisisanacute/chronicproblemforthisstudent.4. Date(s)duringwhichstudentclaimstohavebeenaffectedbythisproblem:
___________________________________________________________________________________
5. UnabletocompleteacademicresponsibiliZesfor: 24hours 2days 3days 4days 5days Other(pleaseindicate)_________________________
6. IfthestudentispermiUedtoconZnuehis/hercourseofstudy,isthemedicalproblemlikelytorecurand
affecthis/herstudiesagain?Yes NoReason:___________________________________________________________________________PHYSICIANVERIFICATIONName:(pleaseprint)_____________________________ RegistraZonNo.________________________Signature:______________________________________ TelephoneNo._________________________Address:_________________________________________________________________________________(stamp,businesscard,orleUerhead
acceptable)PLEASERETAINCOPYFORTHEPATIENT’SCHART.Note:CostofcerAficatetobepaidbystudent.
[1]Thisformhasbeenadapted,withpermission,fromtheUniversityofWindsorFacultyofLawStudentMedicalCerZficateandtheUniversityofWesternOntarioStudentMedicalCerZficate.
17-01-10 23
InteracZonwithprofessor
• Duringlectures• Duringlabs• Duringofficehours:Wednesday1:00-3:00• Emails:jluatuwindsor
– Subjectlinemuststartwith“214”– Example:Subject:214--Aboutassignment1– Mailswithoutpropersubjectmaynotberead(andhencenot
answered)– AUachdetailederrormessages– Writeyournameintheemail
17-01-10 24
Webcontents
• Courseplan;• Slidesforlectures;• AssignmentdescripZons;• Linkstotools,manuals,tutorials;• Listofmarks;• Announcements;
17-01-10 25
Importantnote
• PleasenotethatnostudentisallowedtotakeacoursemorethantwoZmeswithoutpermissionfromtheDean.
17-01-10 28
FormaldefiniZonoflanguage
• Alanguageisasetofstrings– Englishlanguage{“thebrowndoglikesagoodcar”,……}
{sentence|sentencewriUeninEnglish}– Javalanguage{program|programwriUeninJava}– HTMLlanguage{document|documentwriUeninHTML}
• Howdoyoudefinealanguage?• Itisunlikelythatyoucanenumerateallthesentences,
programs,ordocuments
17-01-10 29
Howtodefinealanguage• HowtodefineEnglish
– Asetofwords,suchasbrown,dog,like– Asetofrules
• Asentenceconsistsofasubject,averb,andanobject;• ThesubjectconsistsofanopZonalarZcle,followedbyanopZonaladjecZve,and
followedbyanoun;• ……
– Moreformally:• Words={a,the,brown,friendly,good,book,refrigerator,dog,car,sings,eats,likes}• Rules:
1) SENTENCEàSUBJECTVERBOBJECT2) SUBJECTàARTICLEADJECTIVENOUN3) OBEJCTàARTICLEADJECTIVENOUN4) ARTICLEàa|the|EMPTY5) ADJECTIVEàbrown|friendly|good|EMPTY6) NOUNàbook|refrigerator|dog|car7) VERBàsings|eats|likes
17-01-10 30
DerivaZonofasentence
• Rules:1) SENTENCEàSUBJECTVERBOBJECT2) SUBJECTàARTICLEADJECTIVENOUN3) OBEJCTàARTICLEADJECTIVENOUN4) ARTICLEàa|the|EMPTY5) ADJECTIVEàbrown|friendly|good|EMPTY6) NOUNàbook|refrigerator|dog|car7) VERBàsings|eats|likes
• DerivaZonofasentence“thebrowndoglikesagoodcar”SENTENCEàSUBJECTVERBOBJECTàARTICLEADJECTIVENOUNVERBOBJECTàthebrowndogVERBOBJECTàthebrowndoglikesARTICLEADJECTIVENOUNàthebrowndoglikesagoodcar
17-01-10 31
Theparsetreeofthesentence
The
VERB SUBJECT OBJECT
SENTENCE
ARTICLE ADJ NOUN ARTICLE ADJ NOUN
brown dog likes a good car
Parse the sentence: “the brown dog likes a good car” The top-down approach
17-01-10 32
TopdownandboUomupparsing
The
VERB SUBJECT OBJECT
SENTENCE
ARTICLE ADJ NOUN ARTICLE ADJ NOUN
brown dog likes a good car
17-01-10 33
Typesofparsers
• Topdown– Repeatedlyrewritethestartsymbol– Findthelea-mostderivaZonoftheinputstring– Easytoimplement
• BoUomup– Startwiththetokensandcombinethemtoforminteriornodesofthe
parsetree– Findaright-mostderivaZonoftheinputstring– Acceptwhenthestartsymbolisreached
• BoUomupismoreprevalent
17-01-10 34
FormaldefiniZonofgrammar
• Agrammarisa4-tupleG=(Σ,N,P,S)– Σisafinitesetofterminalsymbols;– Nisafinitesetofnonterminalsymbols;– PisasetofproducZons;– S(fromN)isthestartsymbol.
• TheEnglishsentenceexample– Σ={a,the,brown,friendly,good,book,refrigerator,dog,car,sings,
eats,likes}– N={SENTENCE,SUBJECT,VERB,NOUN,OBJECT,ADJECTIVE,ARTICLE}– S={SENTENCE}– P={rule1)torule7)}
17-01-10 35
RecursivedefiniZon
• Numberofsentencecanbegenerated:ARTICLE ADJ NOUN VERB ARTICLE ADJ NOUN sentences
3* 4* 4* 3* 3* 4* 4* =6912
• Howcanwedefineaninfinitelanguagewithafinitesetofwordsandfinitesetofrules?
• Usingrecursiverules:– SUBJECT/OBJECTcanhavemorethanoneadjecZves:
1) SUBJECTàARTICLEADJECTIVESNOUN2) OBEJCTàARTICLEADJECTIVESNOUN3) ADJECTIVESàADJECTIVE|ADJECTIVESADJETIVE
– Examplesentence:“thegoodbrowndoglikesagoodfriendlybook”
17-01-10 36
Chomskyhierarchy• NoamChomskyhierarchyisbasedontheformofproducZonrules• Generalform
α1 α2 α3 …αn à β1 β2 β3 … βm Whereαandβarefromterminalsandnonterminals,orempty.
• Level3:Regulargrammar– Oftheformα à β or α à β1 β2 – n=1,andαisanonterminal.– β iseitheraterminaloraterminalfollowedbyanonterminal– RHScontainsatmostonenon-terminalattherightend.
• Level2:Contextfreegrammar– Oftheformα àβ1β2β3…βm
– α isnonterminal. • Level1:ContextsensiZvegrammar
– n<m.Thenumberofsymbolsonthelhsmustnotexceedthenumberofsymbolsontherhs
• Level0:unrestrictedgrammar
17-01-10 37
ContextsensiZvegrammar
• CalledcontextsensiZvebecauseyoucanconstructthegrammaroftheform – AαB à A β B – AαC à A γ B
• ThesubsZtuZonofα dependingonthesurroundingcontextAandBorAandC.