Post on 11-May-2018
transcript
LandY.1
CSE
4100
Lex and YaccLex and Yacc
Prof. Steven A. Demurjian
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Way, Unit 2155
Storrs, CT 06269-3155
steve@engr.uconn.eduhttp://www.engr.uconn.edu/~steve
(860) 486 - 4818
Material for course thanks to:
Laurent Michel
Aggelos Kiayias
Robert LeBarre
LandY.2
CSE
4100
Lex and YaccLex and Yacc
�� Two Compiler Writing Tools that are Utilized to easily Two Compiler Writing Tools that are Utilized to easily Specify:Specify:
� Lexical Tokens and their Order of Processing (Lex)
� Context Free Grammar for LALR(1) (Yacc)
�� Both Lex and Yacc have Long History in ComputingBoth Lex and Yacc have Long History in Computing
� Lex and Yacc – Earliest Days of Unix Minicomputers
� Flex and Bison – From GNU
� JFlex - Fast Scanner Generator for Java
� BYacc/J – Berkeley
� CUP, ANTRL, PCYACC, …
� PCLEX and PCYACC from Abacus
LandY.3
CSE
4100
Lex Lex –– A Lexical Analyzer GeneratorA Lexical Analyzer Generator
�� A Unix Utility from early 1970sA Unix Utility from early 1970s
�� A Compiler that Takes as Source a Specification for:A Compiler that Takes as Source a Specification for:
� Tokens/Patterns of a Language
� Generates a “C” Lexical Analyzer Program
�� Pictorially:Pictorially:
LexCompiler
C Compiler
a.out
Lex SourceProgram:lex.y
lex.yy.c
lex.yy.c a.out
Input stream
Sequenceof tokens
LandY.4
CSE
4100
Format of a Lexical Specification Format of a Lexical Specification –– 3 Parts3 Parts
�� Declarations:Declarations:
� Defs, Constants, Types, #includes, etc. that can Occur in a C Program
� Regular Definitions (expressions)
�� Translation Rules:Translation Rules:
� Pairs of (Regular Expression, Action)
� Informs Lexical Analyzer of Action when Pattern is Recognized
�� Auxiliary Procedures:Auxiliary Procedures:
� Designer Defined C Code
� Can Replace System Calls
Lex.y File Format:DECLARATIONS%%TRANSLATION RULES%%AUXILIARY PROCEDURES
LandY.5
CSE
4100
Example lex.l FileExample lex.l File%{#define T_IDENTIFIER 300#define T_INTEGER 301#define T_REAL 302#define T_STRING 303#define T_ASSIGN 304#define T_ELSE 305#define T_IF 306#define T_THEN 307#define T_EQ 308#define T_LT 309#define T_NE 310#define T_GE 311#define T_GT 312%}
letter [a-zA-Z]digit [0-9]ws [ \t\n]+id [A-Za-z][A-Za-z0-9]*comment "(*"([^*]|\n|"*"+[^)])*"*"+")"integer [0-9]+/([^0-9]|"..")real [0-9]+"."[0-9]*([0-9]|"E"[+-]?[0-9]+)string \'([^']|\'\')*\'%%
":=" {printf(" %s ", yytext);return(T_ASSIGN);}"else" {printf(" %s ", yytext);return(T_ELSE);}
User Defined Values to Each Token (else lex will assign)
Regular ExpressionRules for later token definitions
Token Definitions
LandY.6
CSE
4100
"then" {#ifdef PRNTFLGprintf(" %s ", yytext);#endif
return(T_THEN);}
"<=" {printf(" %s ", yytext);return(T_EQ);}"<" {printf(" %s ", yytext);return(T_LT);}"<>" {printf(" %s ", yytext);return(T_NE);}">=" {printf(" %s ", yytext);return(T_GE);}">" {printf(" %s ", yytext);return(T_GT);}
{id} {printf(" %s ", yytext);return(T_IDENTIFIER);}{integer} {printf(" %s ", yytext);return(T_INTEGER);}{real} {printf(" %s ", yytext);return(T_REAL);}{string} {printf(" %s ", yytext);return(T_STRING);}{comment} {/* T_COMMENT */}{ws} {/* spaces, tabs, newlines */}%%yywrap(){return 0;}
main(){int i;do { i = yylex();
} while (i!=0);}
Example lex.l FileExample lex.l File
Conditional compilation action
EOF for input
Three Variables:yytext = “currenttoken”yylen = 12yylval = 300
Token Definitions
Discard
LandY.7
CSE
4100
What is wrong with Following?What is wrong with Following?
letter [a-zA-Z]digit [0-9]ws [ \t\n]+id [A-Za-z][A-Za-z0-9]*comment "(*"([^*]|\n|"*"+[^)])*"*"+")"integer [0-9]+/([^0-9]|"..")real [0-9]+"."[0-9]*([0-9]|"E"[+-]?[0-9]+)string \'([^']|\'\')*\'%%
{id} {printf(" %s ", yytext);return(T_IDENTIFIER);}{integer} {printf(" %s ", yytext);return(T_INTEGER);}{real} {printf(" %s ", yytext);return(T_REAL);}{string} {printf(" %s ", yytext);return(T_STRING);}{comment} {/* T_COMMENT */}{ws} {/* spaces, tabs, newlines */}
":=" {printf(" %s ", yytext);return(T_ASSIGN);}"else" {printf(" %s ", yytext);return(T_ELSE);}"then" {printf(" %s ", yytext);return(T_THEN);}"<=" {printf(" %s ", yytext);return(T_EQ);}"<" {printf(" %s ", yytext);return(T_LT);}"<>" {printf(" %s ", yytext);return(T_NE);}">=" {printf(" %s ", yytext);return(T_GE);}">" {printf(" %s ", yytext);return(T_GT);}
%%
LandY.8
CSE
4100
Other Possible ActionsOther Possible Actions
%%
":=" {return(T_ASSIGN);}"else" {return(T_ELSE);}"then" {return(T_THEN);}"<=" {yylval = T_EQ; return(T_EQ);}... Etc....">" {yylval = T_GT; return(T_GT);}
{id} {yylval = install_id(); return(T_IDENTIFIER);}{integer} {yylval = install_int(); return(T_INTEGER);}{real} {yylval = install_real(); return(T_REAL);}{comment} {/* T_COMMENT */}{ws} {/* spaces, tabs, newlines */}
%%install_id() {/* A procedure to install the lexeme whose first character is pointed to by
yytext and whose length is yylen into symbol table and return a pointer */}
install_int() {/* Similar – but installs an integer lexeme into symbol table */}
install_real() {/* Similar – but installs a real lexeme into symbol table */}
LandY.9
CSE
4100
Revisiting Internal Variables in LexRevisiting Internal Variables in Lex
�� char *yytext; char *yytext;
� Pointer to current lexeme terminated by ‘\0’
�� int yylen; int yylen;
� Number of chacters in yytex but not ‘\0’
�� yylval:yylval:
� Global variable through which the token value can be returned to Yacc
� Parser (Yacc) can access yylval, yylen, and yytext
�� How are these used?How are these used?
� Consider Integer Tokens:
� yylval = ascii_to_integer (yytext);
� Conversion from String to actual Integer Value
LandY.10
CSE
4100
Using the lex CompilerUsing the lex Compiler
�� Important HighlightsImportant Highlights
� Unix Lex defaults with respect to:
� Single Rule size (2048 bytes)
� All Actions (20480 bytes)
� DFA States (512)
� NFA States (254)
�� Command Line:Command Line:
� lex myfile.l Generates lex.yy.c
� pclex myfile.l Generates myfile.c
� -v flag Includes Statistics on State Machine, etc.
LandY.11
CSE
4100
Highlights Generated lex.yy.c FileHighlights Generated lex.yy.c File
# define output (c) putc(c, yyout);# define input() ((( yytchar=yysptr>yysbug?U(*--yysptr); getc(yyin))==10?
yylineno++, yytchar):yytchar)==EOF?0:yytchar)# define uput() (yttchar= (c);if (yytchar==‚\n‘)yylineno--;*yysptr__=yytchar;}
FILE *yyin={stdin}, *yyout = {stdout};
yyinput () {return(input());
}
yyoutput(c) int c {output(c);
}
yyunput(c) int c {upput(c);
}
Compilation at Unix Command Line:lex lexfile.l (creates lex.yy.c)cc lex.yy.c –ll (include lex library)
LandY.12
CSE
4100
Full lex.yy.c FileFull lex.yy.c File
# include "stdio.h"# define U(x) x# define NLSTATE yyprevious3YYNEWLINE# define BEGIN yybgin - yysvec + 1 +# define INITIAL 0# define YYLERR yysvec# define YYSTATE (yyestate-yysvec-1)# define YYOPTIM 1# define YYLMAX BUFSIZ# define output(c) putc(c,yyout)# define inputO (((yytchar-yysptr>yysbuf?U(*--yysptr):
getc(yyin))--10?(yylineno++,yytchar):yytchar)--EOF?0:yytchar) # define unput(c) {yytchar= (c);if(yytchar=-'\n')yylineno--;*yysptr++-yytchar;} # define yymore () (yymorfg-1)# define ECHO fprintf(yyout, "%s",yytext)# define REJECT { nstr - yyreject(); goto yyfussy;}int yyleng; extern char yytext[];int yymorfg;extern char *yysptr, yysbuf[];int yytchar;FILE *yyin - {stdin}, *yyout - {stdout);extern int yylineno;struct yysvf {
struct yywork *yystoff;struct yysvf *yyother;int *yystops; };
struct yysvf *yyestate;extern struct yysvf yysvec[], *yybgin;
LandY.13
CSE
4100
Full lex.yy.c FileFull lex.yy.c File#define T_IDENTIFIER 300#define T INTEGER 301#define T_REAL 302#define T STRING 303#define T_ASSIGN 304#define T ELSE 305#define T_IF 306#define T_THEN 307#define T_EQ 308#define T LT 309#define T_NE 310#define T GE 311#define T_GT 312#define YYNEWLINE 10yylex ( ) {int nstr; extern int yyprevious;while((nstr - yylook()) >- 0)yyfussy: switch(nstr) {case 0:if(yywrap()) return(0); break;case 1:
{printf(" %s ", yytext);return(TASSIGN);}break; case 2:
{printf(" %s ", yytext);return(T_ELSE);}break; case 3:
(printf(" %s ", yytext) ;return (T IF) ; }break;
LandY.14
CSE
4100
Full lex.yy.c FileFull lex.yy.c Filecase 4:{#ifdef PRNTFLGprintf(" %s ", yytext);#endifreturn(T_THEN);}break; case 5:
{printf(" %s ", yytext);return(T_EQ);}break; case 6:
{printf(" %s ", yytext);return(T_LT);}break; case 7:
{printf(" %s ", yytext);return(T_NE);)break; case 8:
{printf(" %s ", yytext);return(T_GE);}break; case 9:
{printf(" %s ", yytext);return(T_GT);}break; case 10:
{printf(" %s ", yytext);return(T_IDENTIFIER);}break; case 11:
{printf(" %s ", yytext);return(T_INTEGER);)break; case 12:
{printf(" %s ", yytext) ;return(T_REAL); }break; case 13:
{printf(" %s ", yytext);return(T_STRING);}
LandY.15
CSE
4100
Full lex.yy.c FileFull lex.yy.c File
break; case 14:{/* T COMMENT */}break; case 15:{/* spaces, tabs, newlines */}break; case -1:break; default:fprintf(yyout,"bad switch yylook %d",nstr);) return (0); }/* end of yylex */
yywrapO{}
main() {int i; do {i = yylex();} while (i!=0);}
LandY.16
CSE
4100
A Pascal lex.lA Pascal lex.l%{#include "y.tab.h"%}
letter [a-zA-Z]digit [0-9]
ws [ \t\n]+id [A-Za-z][A-Za-z0-9]*comment "(*"([^*]|\n|"*"+[^)])*"*"+")"integer [0-9]+/([^0-9]|"..")real [0-9]+"."[0-9]*([0-9]|"E"[+-]?[0-9]+)string \'([^']|\'\')*\'
%%
":=" {return(T_ASSIGN);}":" {return(T_COLON);}"array" {return(T_ARRAY);}"begin" {return(T_BEGIN);}"case" {return(T_CASE);}"const" {return(T_CONST);}"downto" {return(T_DOWNTO);}"do" {return(T_DO);}"else" {return(T_ELSE);}"end" {return(T_END);}"file" {return(T_FILE);}"for" {return(T_FOR);}
LandY.17
CSE
4100
A Pascal lex.lA Pascal lex.l"function" {return(T_FUNCTION);}/* "goto" {return(T_GOTO);} */"if" {return(T_IF);}"label" {return(T_LABEL);}"nil" {return(T_NIL);}"not" {return(T_NOT);}"of" {return(T_OF);}/* "packed" {return(T_PACKED);} */"procedure" {return(T_PROCEDURE);}"end" {return(T_END);}"program" {return(T_PROGRAM);}"record" {return(T_RECORD);}"repeat" {return(T_REPEAT);}"set" {return(T_SET);}"then" {return(T_THEN);}"to" {return(T_TO);}"type" {return(T_TYPE);}"until" {return(T_UNTIL);}"var" {return(T_VAR);}"while" {return(T_WHILE);}/* "with" {return(T_WITH);} */"+" {return(T_PLUS);}"-" {return(T_MINUS);}"or" {return(T_OR);}"and" {return(T_AND);}"div" {return(T_DIV);}"mod" {return(T_MOD);}"/" {return(T_RDIV);}
LandY.18
CSE
4100
A Pascal lex.lA Pascal lex.l
"*" {return(T_MULT);}"(" {return(T_LPAREN);}")" {return(T_RPAREN);}"=" {return(T_EQ);}"," {return(T_COMMA);}".." {return(T_RANGE);}"." {return(T_PERIOD);}"[" {return(T_LBRACK);}"]" {return(T_RBRACK);}"<=" {return(T_EQ);}"<" {return(T_LT);}"<>" {return(T_NE);}">=" {return(T_GE);}">" {return(T_GT);}"in" {return(T_IN);}"^" {return(T_UPARROW);}";" {return(T_SEMI);}
{id} {return(T_IDENTIFIER);}{integer} {return(T_INTEGER);}{real} {return(T_REAL);}{string} {return(T_STRING);}{comment} {/* T_COMMENT */}{ws} {/* spaces, tabs, newlines */}
LandY.19
CSE
4100
Project Part 1 Project Part 1 –– Fall 2011Fall 2011
�� What is Latex?What is Latex?
�� Text Processing LanguageText Processing Language
� Embed Commands into Ascii File
� Opposite of Word’s WYSIWYG
� Geared Towards Publishing – Particularly Prior to Newer Versions of Work
�� Very Powerful Text Formatting LanguageVery Powerful Text Formatting Language
�� Invented by Computer Scientist Donald KnuthInvented by Computer Scientist Donald Knuth
� http://www-cs-faculty.stanford.edu/~uno/
� http://www-cs-faculty.stanford.edu/~uno/abcde.html
�� Famous for: The Art of Computer ProgrammingFamous for: The Art of Computer Programming
� http://www-cs-faculty.stanford.edu/~uno/taocp.html
LandY.20
CSE
4100
Project Part 1 Has Three TasksProject Part 1 Has Three Tasks
�� Task 1: Oct 5: Design and implement a lexical Task 1: Oct 5: Design and implement a lexical analyzer using the flex generator on the Linux analyzer using the flex generator on the Linux boxes that is able to identify all lexical tokens for boxes that is able to identify all lexical tokens for the latex subset.the latex subset.
�� Task 2: Oct 12: Design and develop a context free Task 2: Oct 12: Design and develop a context free grammar (CFG) for a subset of Latex.grammar (CFG) for a subset of Latex.
�� Task 3: Oct 17: Calculate FIRST and FOLLOW for a Task 3: Oct 17: Calculate FIRST and FOLLOW for a grammar provided after deliverable part 1b.grammar provided after deliverable part 1b.
LandY.21
CSE
4100
latex.all.txtlatex.all.txt
BASIC LATEX COMMANDS/OPTIONS
The following discusses the Latex commands and options which will besupported in our text processor. TEXT THAT IS SHOWN IN ALL CAPITAL LETTERS CORRESPONDS TO TOKENS WHICH HAVE MANY DIFFERENT OPTIONS.
1. Section, Subsections, and Table of Contents
Commands Examples or Explanation
\section{STRING} \section{Introduction}\subsection{STRING} \subsection{A Text Processor}
\subsection{Legal Latex Commands}\section{Using Latex}
\tableofcontents Generate a table of contents with page numbers
Specifically, it would generate:1 Introduction
1.1 A Text Processor
1.2 Legal Latex Commands
2 Using Latex
LandY.22
CSE
4100
latex.all.txtlatex.all.txt
2. Formatting Commands That Effect The Overall Document
Commands Examples or Explanation
\renewcommand{\baselinestretch}{INTEGER} Establish the spacing1 is single, 2 is double, etc.
\pagenumbering{STYLE} STYLE is either arabic, roman, alph, Roman, or Alpharabic numbers pages using 1, 2, 3, ... etc.roman numbers pages using i, ii, iii, ... etc.alph numbers pages using a, b, c, ... etc.Roman numbers pages using I, II, III, ... etc.Alph numbers pages using A, B, C, ... etc.
\arabic{COUNTER} COUNTER indicates the initial value of page numbers\roman{COUNTER} COUNTER indicates the initial value of page numbers\alph{COUNTER} COUNTER indicates the initial value of page numbers
In this case, counter must be <= 26.\Roman{COUNTER} COUNTER indicates the initial value of page numbers\Alph{COUNTER} COUNTER indicates the initial value of page numbers
In this case, counter must be <= 26.\vspace{INTEGER} Insert an INTEGER number of blank lines\hspace{INTEGER} Insert an INTEGER number of blank spaces
\rm Change the font to roman\it Change the font to italics or underline
When the \rm or \it commands are used within curly braces, i.e., {\it The Huskies win again!}, only the text within the braces is affected. Otherwise, the command switches the mode of printing from that point on in the text.
LandY.23
CSE
4100
latex.all.txtlatex.all.txt
3. Using Backslash to Indicate a Character Rather Than a Command.
The backslash character (\) is used to tell Latex that the next character should be treated as a character and not as a command. The backslash isused with the following characters:
$ & % # { } _
Without the backslash, each character has a special meaning, i.e., % isfor a comment that is ignored during text processing, & divides column entries of tables, etc. With a backslash, i.e., \%, the character isinterpreted as itself.
LandY.24
CSE
4100
latex.all.txtlatex.all.txt
4. Begin/End Blocks - Centering and Verbatim
Begin/end blocks are used within Latex to identify a scope over which agiven command applies. They are best illustrated with examples.
\begin{verbatim} The verbatim option displays the text exactlyFour Score and as it appears within the input file.
Seven YearsAgo Our Forefathers\end{verbatim}
\begin{center} The center option centers the entire block of Four Score and\\ text as a single unit. The \\ are used toSeven Years\\ signal the end of a line.Ago Our Forefathers\end{center}
This produces the output:Four Score andSeven Years
Ago Our Forefathers
Without the second \\, after Seven Years, the output would be:
Four Score andSeven Years Ago Our Forefathers
LandY.25
CSE
4100
latex.all.txtlatex.all.txt
Commands can be combined, such as:
\begin{center} This combination centers the entire block,\begin{verbatim} exactly as it appears, without changingFour Score and the indentation within each line.
Seven YearsAgo Our Forefathers\end{verbatim}\end{center}
The output in this case would be:
Four Score andSeven Years
Ago Our Forefathers
LandY.26
CSE
4100
latex.all.txtlatex.all.txt
5. Begin/End Blocks - single and Lists
Begin/end blocks can also be utilized to construct lists of items automatically. For example, the following input and commands:
\begin{single}\begin{itemize}\item Lexical Analyzer uses DFAs and NFAs\item Parsing using CFGs\item Code Generation uses templates and also makes extensive use of syntax-directed translation via attribute grammars\end{itemize}\end{single}\noindentThese are some of the phases for compilation that we'll study over thecourse of the semester.
Produces the output:
- Lexical Analyzer uses DFAs and NFAs- Parsing uses CFGs- Code Generation uses templates and also makes extensive use
of syntax-directed translation via attribute grammars.
These are some of the phases for compilation that we'll study over thecourse of the semester.The command \noindent is used to make sure that a new paragraph isnot started after the list has completed, which would occur as a default.
LandY.27
CSE
4100
latex.all.txtlatex.all.txt
The enumerate option is similar, but generates numbers for each item:
\begin{enumerate}\item Lexical Analyzer uses DFAs and NFAs\item Parsing using CFGs\item Code Generation uses templates and also makes extensive use of syntax-directed translation via attribute grammars\end{enumerate}
Notice that without the single begin/end block, the following output isproduced:
1. Lexical Analyzer uses DFAs and NFAs
2. Parsing uses CFGs
3. Code Generation uses templates and also makes extensive use
of syntax-directed translation via attribute grammars.
LandY.28
CSE
4100
A Sample Latex Input File A Sample Latex Input File –– latex.in.texlatex.in.tex
\begin{document}\pagenumbering{arabic}\arabic{5} \renewcommand{\baselinestretch}{2}\tableofcontents
\section{Introduction}
This is an example of text that would be transformed into a paragraph inlatex. Blank lines between text in the input cause a new paragraph to be generated.
\vspace{10}
\itWhen the blank line occurs after a section, no indentation of the paragraphis performed. However, \hspace{20} all other blanks, would result in a 5 space indent of the paragraph.\rm
\subsection{A Text Processor}
A {\it text processor} is a very useful tool, since it allows us todevelop formatted documents that are easy to read.
LandY.29
CSE
4100
A Sample Latex Input File A Sample Latex Input File –– latex.in.texlatex.in.tex
\subsection{Legal Latex Commands}
We have seen that there are many different Latex commands, that can be usedin many different ways. However, sometimes, we wish to use a character tomean itself, and override its Latex interpretation. For example, to usecurly braces, we employ the backslash \{ a set of integers \}.
\section{Using Latex}
Finally, there are many other useful commands that involve begin/end blocks,that establish an environment. These blocks behave in a similar fashion tobegin/end blocks in a programming language, since they set a scope. Wehave discussed a number of examples:
\begin{single}\begin{enumerate}\item single is for single spacing\item verbatim allows text that matches the what you see is what you get mode\item itemize uses ticks to indicate items\item center allows a block to be centered\end{enumerate}\end{single}\noindentIt is important to note, even at this early stage, that lists may be createdwithin lists, allowing the nesting of blocks and environments.\end{document}
LandY.30
CSE
4100
NotesNotes
�� Not all of my Latex works in MikTex since it is based Not all of my Latex works in MikTex since it is based on an older version of Latex on an older version of Latex
�� In prior two slides In prior two slides –– to get this to work you need to:to get this to work you need to:
� Add \documentstyle{article} as First Line
� Add pt to the vspace and hspace
� \vspace{10pt}
� \hspace{20pt}
� Delete the \arabic{5}
�� Web page has: latex.in.miktex.tex File with ChangesWeb page has: latex.in.miktex.tex File with Changes
LandY.31
CSE
4100
Latex Extensions Latex Extensions -- Tables And Automatic NumberingTables And Automatic Numbering
Latex is extended to support the definition of tables and their automaticnumbering. As an example, consider the following:
This is how tables are used in Latex. First a reference to a table, sayfor a table of Latex commands, must be given. Table \ref{latexcmds} isshown below.
\begin{table}[h]\begin{center}\begin{tabular}{rcl}No.& Command & Explanation \\1 & center & allows centering of text \\2 & it & used for italics \\3 & item & used to identify items in a list \\\end{tabular}\end{center}\caption{A Table of Latex Commands!!}\label{latexcmds}\end{table}
LandY.32
CSE
4100
Latex Extensions Latex Extensions -- Tables And Automatic NumberingTables And Automatic Numbering
This example produces the following output:
This is how tables are used in Latex. First a reference to a table, say for a table of Latex commands, must be given. Table 1 is shown below.
No. Command Explanation 1 center allows centering of text 2 it used for italics 3 item used to identify items in a list
Table 1. A Table of Latex Commands!!
Notice that the table has been centered and the first column is right justified, the second column is centered, and the third column is left justified.
LandY.33
CSE
4100
Latex Extensions Latex Extensions -- Tables And Automatic NumberingTables And Automatic Numbering
Now, a brief explanation of the options:
\begin{tabular}{column-spec} where column-spec is any sequence of one ormore r (right), l (left), or c (center)
options.
... & ... & ... \\ where & separates columns and \\ ends a row.
\end{tabular} which signals the end of the table.
\begin{table}[location-options] indicates the start of the table environment,where location-options indicates where toplace a table and may be either h (for here),t (for float to top of next page), or b (forfloat to bottom of current or next page).
\caption{STRING} which indicates the tables caption\label{WORD} which labels the caption/table with a word\end{table} used to finish the table environment
Then, when \ref{WORD} appears in the text, the label is searched for andthe automatic number assigned to the table is inserted.
LandY.34
CSE
4100
Other Sample Latex Files Other Sample Latex Files –– Cent.tstCent.tst
�� I will Send out an Email with a Zip File of TestsI will Send out an Email with a Zip File of Tests
�� Your Lexical Analyzer Should Recognize All of these!Your Lexical Analyzer Should Recognize All of these!
\begin{document}\pagenumbering{arabic} \arabic{5} \renewcommand{\baselinestretch}{2}
A Basic file that checks to see if the centering command workscorrectly. Note that the double backslash should indicate whatshould be centered and how it is centered.
\begin{center}Single is for Single spacing\\Verbatim allows text produced as is\\Itemize uses ticks to indicate items\\Center allows a block to be centered\\\end{center}
\end{document}
LandY.35
CSE
4100
Project Part 1 Has Three TasksProject Part 1 Has Three Tasks
�� Oct 5: Design and implement a lexical analyzer Oct 5: Design and implement a lexical analyzer using the flex generator on the Linux boxes that is using the flex generator on the Linux boxes that is able to identify all lexical tokens for the latex able to identify all lexical tokens for the latex subset.subset.
�� Oct 12: Design and develop a context free grammar Oct 12: Design and develop a context free grammar (CFG) for a subset of Latex.(CFG) for a subset of Latex.
�� Oct 17: Calculate FIRST and FOLLOW for a grammar Oct 17: Calculate FIRST and FOLLOW for a grammar provided after deliverable part 1b.provided after deliverable part 1b.
LandY.36
CSE
4100
Working Flex for Project Part 1 Working Flex for Project Part 1 –– Fall 2011Fall 2011
%{ /* THIS IS LATEX.L */
#include <stdio.h>#define TBEGIN 200#define TEND 201#define TDOCUMENT 202#define TWORD 203#define TBACKSL 204#define TLCURLYB 205#define TRCURLYB 206
%}ws [ \t\n]+word ([a-zA-Z0-9])*
%%
LandY.37
CSE
4100
Working Flex for Project Part 1 Working Flex for Project Part 1 –– Fall 2011Fall 2011
"\\" {printf(" Val: %d\t; Lexeme: %s \n", TBACKSL, yytext);return(TBACKSL);}
"{" {printf(" Val: %d\t; Lexeme: %s \n", TLCURLYB, yytext);return(TLCURLYB);}
"}" {printf(" Val: %d\t; Lexeme: %s \n", TRCURLYB, yytext);return(TRCURLYB);}
"begin" {printf(" Val: %d\t; Lexeme: %s \n", TBEGIN, yytext);return(TBEGIN);}
"document" {printf(" Val: %d\t; Lexeme: %s \n", TDOCUMENT, yytext);return(TDOCUMENT);}
"end" {printf(" Val: %d\t; Lexeme: %s \n", TEND, yytext);return(TEND);}
{word} {printf(" Val: %d\t; Lexeme: %s \n", TWORD, yytext);return(TWORD);}
{ws} { /* DO NOTHING */ }
%%
�� Recognize Following Tokens in OrderRecognize Following Tokens in Order
�� Note “Note “\\\\” to Recongnize “” to Recongnize “\\””
LandY.38
CSE
4100
Flex for Project Part 1 Flex for Project Part 1 –– Fall 2011Fall 2011
�� Remaining Code:Remaining Code:
�� Building lex.yy.c and Compiling/Executing:Building lex.yy.c and Compiling/Executing:
� ssh to Engineering Linux Box
� flex latex.l
� gcc lex.yy.c –lfl
� a.out < latex.l
/* need main routine at bottom */yywrap(){return 0;}
main(){int i;do { i = yylex();printf("i is: %d ****\n", i);
} while (i!= EOF);}
LandY.39
CSE
4100
Lex.yy.c FileLex.yy.c File
#line 3 "lex.yy.c"
#define YY_INT_ALIGNED short int
/* A lexical scanner generated by flex */
#define FLEX_SCANNER#define YY_FLEX_MAJOR_VERSION 2#define YY_FLEX_MINOR_VERSION 5#define YY_FLEX_SUBMINOR_VERSION 34#if YY_FLEX_SUBMINOR_VERSION > 0#define FLEX_BETA#endif
/* First, we deal with platform-specific or compiler-specific issues. */
/* begin standard C headers. */#include <stdio.h>#include <string.h>#include <errno.h>#include <stdlib.h>
/* end standard C headers. */
LandY.40
CSE
4100
Lex.yy.c FileLex.yy.c File/* THOUSAND LINES OF CODE MISSING */
void yyfree (void * ptr ){
free( (char *) ptr ); /* see yyrealloc() for (char *) cast */}
#define YYTABLES_NAME "yytables"
#line 29 "latex.l"
/* need main routine at bottom */yywrap(){return 0;}
main(){int i;do { i = yylex();printf("i is: %d ****\n", i);
} while (i!= EOF);}
LandY.41
CSE
4100
Sample Latex Input File doc.tex and OutputSample Latex Input File doc.tex and Output
\begin{document}Hello world Does this work even on multiple lines\end{document}
a.out < doc.texVal: 204 ; Lexeme: \i is: 204 ****Val: 203 ; Lexeme: begiin i is: 203 ****Val: 205 ; Lexeme: { i is: 205 ****Val: 202 ; Lexeme: document i is: 202 ****Val: 206 ; Lexeme: } i is: 206 ****Val: 203 ; Lexeme: Hello i is: 203 ****Val: 203 ; Lexeme: world i is: 203 ****Val: 203 ; Lexeme: Does i is: 203 ****Val: 203 ; Lexeme: this
LandY.42
CSE
4100
Output ContinuedOutput Continuedi is: 203 ****Val: 203 ; Lexeme: work i is: 203 ****Val: 203 ; Lexeme: even i is: 203 ****Val: 203 ; Lexeme: on i is: 203 ****Val: 203 ; Lexeme: multiple i is: 203 ****Val: 203 ; Lexeme: lines i is: 203 ****Val: 204 ; Lexeme: \i is: 204 ****Val: 201 ; Lexeme: end i is: 201 ****Val: 205 ; Lexeme: { i is: 205 ****Val: 202 ; Lexeme: document i is: 202 ****Val: 206 ; Lexeme: } i is: 206 ****
LandY.43
CSE
4100
Latexv2.l and docv2.texLatexv2.l and docv2.textablespec \[(h|t|b)\]colspec (c|l|r)+
%%
"\\" {printf(" Val: %d\t; Lexeme: %s \n", TBACKSL, yytext);return(TBACKSL);}
"{" {printf(" Val: %d\t; Lexeme: %s \n", TLCURLYB, yytext);return(TLCURLYB);}
"}" {printf(" Val: %d\t; Lexeme: %s \n", TRCURLYB, yytext);return(TRCURLYB);}
"begin" {printf(" Val: %d\t; Lexeme: %s \n", TBEGIN, yytext);return(TBEGIN);}
"document" {printf(" Val: %d\t; Lexeme: %s \n", TDOCUMENT, yytext);return(TDOCUMENT);}
"end" {printf(" Val: %d\t; Lexeme: %s \n", TEND, yytext);return(TEND);}
{tablespec} {printf(" Val: %d\t; Lexeme: %s \n", TTABLESPEC,yytext);return(TTABLESPEC);}
{colspec} {printf(" Val: %d\t; Lexeme: %s \n",TCOLSPEC, yytext);return(TCOLSPEC);}
{word} {printf(" Val: %d\t; Lexeme: %s \n", TWORD, yytext);return(TWORD);}
{ws} { /* DO NOTHING */ }
LandY.44
CSE
4100
Latexv2.l and docv2.texLatexv2.l and docv2.tex\begiin{document}Hello world Does [b] this work even on cclcrr [h]multiple ccc lrcll lines [t]\end{document}
LandY.45
CSE
4100
Latexv3.l and docv2.texLatexv3.l and docv2.textablespec \[(h|t|b)\]colspec (c|l|r)+
%%
"\\" {printf(" Val: %d\t; Lexeme: %s \n", TBACKSL, yytext);return(TBACKSL);}
"{" {printf(" Val: %d\t; Lexeme: %s \n", TLCURLYB, yytext);return(TLCURLYB);}
"}" {printf(" Val: %d\t; Lexeme: %s \n", TRCURLYB, yytext);return(TRCURLYB);}
"\\begin" {printf(" Val: %d\t; Lexeme: %s \n", TBEGIN, yytext);return(TBEGIN);}
"\{document\}" {printf(" Val: %d\t; Lexeme: %s \n", TDOCUMENT, yytext);return(TDOCUMENT);}
"\\end" {printf(" Val: %d\t; Lexeme: %s \n",TEND, yytext);return(TEND);}
{tablespec} {printf(" Val: %d\t; Lexeme: %s \n", TTABLESPEC, yytext);return(TTABLESPEC);}
{colspec} {printf(" Val: %d\t; Lexeme: %s \n", TCOLSPEC, yytext);return(TCOLSPEC);}
{word} {printf(" Val: %d\t; Lexeme: %s \n", TWORD, yytext);return(TWORD);}
{ws} { /* DO NOTHING */ }
LandY.46
CSE
4100
Latexv3.l and docv2.tex OutputLatexv3.l and docv2.tex OutputVal: 204 ; Lexeme: \i is: 204 ****Val: 203 ; Lexeme: beigin i is: 203 ****Val: 202 ; Lexeme: {document}i is: 202 ****Val: 203 ; Lexeme: Hello i is: 203 ****Val: 203 ; Lexeme: world i is: 203 ****Val: 203 ; Lexeme: Does i is: 203 ****Val: 207 ; Lexeme: [b] i is: 207 ****Val: 203 ; Lexeme: this i is: 203 ****Val: 203 ; Lexeme: work i is: 203 ****Val: 203 ; Lexeme: even i is: 203 ****Val: 203 ; Lexeme: on i is: 203 ****Val: 208 ; Lexeme: cclcrr i is: 208 ****
Val: 207 ; Lexeme: [h] i is: 207 ****Val: 203 ; Lexeme: multiple i is: 203 ****Val: 208 ; Lexeme: ccc i is: 208 ****Val: 208 ; Lexeme: lrcll i is: 208 ****Val: 203 ; Lexeme: lines i is: 203 ****Val: 207 ; Lexeme: [t] i is: 207 ****Val: 201 ; Lexeme: \end i is: 201 ****Val: 202 ; Lexeme: {document}i is: 202 ****
LandY.47
CSE
4100
Project 1 Task 2 Project 1 Task 2 –– Fall 2011Fall 2011
�� Task 1: Oct 5: Design and implement a lexical analyzer Task 1: Oct 5: Design and implement a lexical analyzer using the flex generator on the Linux boxes that is able using the flex generator on the Linux boxes that is able to identify all lexical tokens for the latex subset.to identify all lexical tokens for the latex subset.
�� Task 2: Oct 12: Design and develop a context free Task 2: Oct 12: Design and develop a context free grammar (CFG) for a subset of Latex.grammar (CFG) for a subset of Latex.
�� Task 3: Oct 17: Calculate FIRST and FOLLOW for a Task 3: Oct 17: Calculate FIRST and FOLLOW for a grammar provided after deliverable part 1b.grammar provided after deliverable part 1b.
�� Design a CFG for the project that allows Latex Design a CFG for the project that allows Latex programs (e.g., text to be formatted) to be recognized. programs (e.g., text to be formatted) to be recognized.
�� This will provide you with important language design This will provide you with important language design experience.experience.
�� How do you Get Started?How do you Get Started?
�� Let’s Consider Initial Grammar in Project 1 SpecLet’s Consider Initial Grammar in Project 1 Spec
LandY.48
CSE
4100
CFG For LatexCFG For Latex
�� Latex Program is defined by Latex Program is defined by start_doc, start_doc, end_doc, and main_bodyend_doc, and main_body
�� main_body main_body is Left Recursive with Multipleis Left Recursive with Multiplemain_optionsmain_options
�� Main_option is either text_option or Main_option is either text_option or latex_otpionslatex_otpions
latex_statement ---> start_doc main_body end_dostart_doc ---> "\" "begin" "{" "document" "}end_doc ---> "\" "end" "{" "document" "}“
main_body ---> main_body main_option| main_option
main_option ---> text_option| latex_options
LandY.49
CSE
4100
CFG For LatexCFG For Latex
�� Text_option is a sequence of “words”Text_option is a sequence of “words”�� Latex_options starts with eitherLatex_options starts with either
� A backslash “\”� A left curly brace “{”
�� Backs_options can be Backs_options can be � Begin/end blocks� Sections� Etc.
text_option ---> text_option "word"| "word"
latex_options ---> "\" backs_options| "{" curlyb_options
backs_options ---> begin_end_opts| section_options | | etc..YOU NEED TO COMPLETE THIS!!
LandY.50
CSE
4100
CFG For LatexCFG For Latex
begin_end_opts ---> begin_options begin_block end_options
begin_options ---> "begin" "{" beg_end_cmds "}" table_options
end_options ---> "end" "{" beg_end_cmds "}"
begin_block ---> WHAT ARE THE POSSIBILITIES???
begin_end_cmds ---> "center" | "verbatim" | etc...
table_options ---> "[" position "]"| epsilon
position ---> "h" | "t" | "b“
section_options ---> "section" "{" text_option "}"| "subsection" "{" text_option "}"
ETC... TO BE COMPLETED BY YOU!!!!
LandY.51
CSE
4100
CFG For LatexCFG For Latex
�� How would we write one of the begin_blocks, say for How would we write one of the begin_blocks, say for an Itemize List?an Itemize List?
�� What are some of the curlyb_options?What are some of the curlyb_options?
�� What are other simple backs_options?What are other simple backs_options?
Itemize_list ---> itemize_list item| item
What Does item go to?
Curlyb_options ---> roman| italics
Roman ---> “\” “rm” text_option “}”
Backs_otpions ---> backs_roman| backs_italics
Backs_roman ---> “rm” text_option ???| “rm” latex_ptopms ???
LandY.52
CSE
4100
Key IssueKey Issue
�� Need to ReNeed to Re--Examine and Reanalyze latex.all.txt and all Examine and Reanalyze latex.all.txt and all of the various test cases (emailed)of the various test cases (emailed)
�� Look for the Required SturctureLook for the Required Sturcture
� What are the Different Blocks?
� What are Options within Blocks?
� How are Nested Blocks Supported?
� What are Backslash and Curly Brace Options?
�� You need to make sure that your Grammar can “Parse” You need to make sure that your Grammar can “Parse” any of the sample test casesany of the sample test cases
�� You check this by Doing a Derivation for the Test You check this by Doing a Derivation for the Test Case or for a Portion of LatexCase or for a Portion of Latex
LandY.53
CSE
4100
Project 1 Task 3 Project 1 Task 3 –– Fall 2011Fall 2011
�� Task 1: Oct 5: Design and implement a lexical analyzer Task 1: Oct 5: Design and implement a lexical analyzer using the flex generator on the Linux boxes that is able using the flex generator on the Linux boxes that is able to identify all lexical tokens for the latex subset.to identify all lexical tokens for the latex subset.
�� Task 2: Oct 12: Design and develop a context free Task 2: Oct 12: Design and develop a context free grammar (CFG) for a subset of Latex.grammar (CFG) for a subset of Latex.
�� Task 3: Oct 17: Calculate FIRST and FOLLOW for a Task 3: Oct 17: Calculate FIRST and FOLLOW for a grammar provided after deliverable part 1b.grammar provided after deliverable part 1b.
�� We will use Yacc Notation for the GrammarWe will use Yacc Notation for the Grammar
� See Following Slides
� Notice that “:” replaces arrow and | still means alternate rule.\
LandY.54
CSE
4100
Yacc For LatexYacc For Latex
#include <stdio.h>#include <ctype.h>%}%start latexstatement
%token BACKSL LBEGIN LCURLYB DOCUMENT RCURLYB END%token WORD WSWORD SPECCHAR CENTER VERBATIM SINGLE %token ITEMIZE ENUMERATE TABULAR TABLE LSQRB RSQRB%token H T B R C L%token CAPTION LABEL DBLBS ITEM SECTION SUBSEC %token TABOCON RENEW BASELINES INTEGER PAGENUM ARABIC %token LROMAN CROMAN LALPH CALPH VSPACE HSPACE%token RM IT NOINDENT REF
%%latexstatement : startdoc mainbody enddoc
;
startdoc : BACKSL LBEGIN LCURLYB DOCUMENT RCURLYB;
enddoc : BACKSL END LCURLYB DOCUMENT RCURLYB
LandY.55
CSE
4100
Yacc For LatexYacc For Latexmainbody : mainbody mainoption
| mainoption;
mainoption : textoption| commentoption| latexoptions;
textoption : textoption WORD| WORD;
wstextoption : wstextoption WSWORD| WSWORD;
commentoption : SPECCHAR textoption;
latexoptions : BACKSL backsoptions| LCURLYB curlyboptions RCURLYB;
curlyboptions : BACKSL fonts textoption
LandY.56
CSE
4100
BisonBison
�� Compiler Writing Tool that Generates LALR(1) ParserCompiler Writing Tool that Generates LALR(1) Parser
�� Grammar Rules (BNF) can be Modified/Augmented Grammar Rules (BNF) can be Modified/Augmented with Semantic Actions via Code Segmentswith Semantic Actions via Code Segments
�� Can work in Conjunction with Lex or SeparatelyCan work in Conjunction with Lex or Separately
�� Three Major Parts of a Bison Specification:Three Major Parts of a Bison Specification:
Declarations%%Grammar Rules%%User Supplied Programs
LandY.57
CSE
4100
A First ExampleA First Example
%{/*Includes and Global Variables here*/ #include <stdio.h>#include <ctype.h>%}%start line%token DIGIT%%/* Grammar Rules */line : expr '\n'
;
expr : expr '+' term| term
;
term : term '*' fact| fact
;
fact : '(' expr ')'| DIGIT
;%%
%%/* Define own yylex */yylex(){
int c;c = getchar();if (isdigit(c)) { yylval = c-'0';return DIGIT;
}return c;
}/* Error Routine */yyerror(){}
/* yyparse calls yylex */main(){yyparse();}
LandY.58
CSE
4100
How Do Grammar Rules Fire?How Do Grammar Rules Fire?
�� Follow RM Derivation in Reverse! Input 5 + 3 * 8Follow RM Derivation in Reverse! Input 5 + 3 * 8
line : expr '\n'
expr : expr '+' term| term
term : term '*' fact| fact
fact : '(' expr ')'| DIGIT
E ⇒⇒⇒⇒ E + T ⇒⇒⇒⇒ E + T * F⇒⇒⇒⇒ E + T * DIGIT⇒⇒⇒⇒ E + F * DIGIT⇒⇒⇒⇒ E + DIGIT * DIGIT⇒⇒⇒⇒ T + DIGIT * DIGIT⇒⇒⇒⇒ F + DIGIT * DIGIT⇒⇒⇒⇒ DIGIT + DIGIT * DIGIT
LandY.59
CSE
4100
Stack Performs RM Derivation in ReverseStack Performs RM Derivation in Reverse
DIGIT
E ⇒⇒⇒⇒ E + T ⇒⇒⇒⇒ E + T * F⇒⇒⇒⇒ E + T * DIGIT⇒⇒⇒⇒ E + F * DIGIT⇒⇒⇒⇒ E + DIGIT * DIGIT⇒⇒⇒⇒ T + DIGIT * DIGIT⇒⇒⇒⇒ F + DIGIT * DIGIT⇒⇒⇒⇒ DIGIT + DIGIT * DIGIT
F T E +E
DIGIT
+
E
F
+
E
T
+
E
*
T
+
E
DIGIT
*
T
+
E
F
*
T
+
E
T
+
E
E
LandY.60
CSE
4100
LALR State MachineLALR State Machine
�� (bison (bison ––v *.y) Generates y.outputv *.y) Generates y.outputstate 0
$accept : _line $endDIGIT shift 6 ( shift 5. errorline goto 1 expr goto 2 term goto 3 fact goto 4
state 1$accept line_$end$end accept . error
state 2line : expr_ (1) expr : expr_+ term+ shift 7• reduce 1
state 3expr : term_ (3) term : term_* fact* shift 8. reduce 3
state 4term : fact_ (5). reduce 5
state 5fact : (_expr )DIGIT shift 6 ( shift 5. errorexpr goto 9 term goto 3 fact goto 4
state 6fact : DIGIT (7). reduce 7
state 7expr : expr +_termDIGIT shift 6 ( shift 5. errorterm goto 10U fact goto 4
state 8term : term *_factDIGIT shift 6 ( shift 5. errorfact goto 11
LandY.61
CSE
4100
LALR State MachineLALR State Machine
state 9expr : expr_+ term fact : ( expr_)+ shift 7 ) shift 12 • error
state 10expr : expr + term_ (2) term : term_* fact* shift 8 • reduce 2
state 11term : term * fact_ (4). reduce 4
state 12fact : ( expr )_ (6)reduce 6
7/300 terminals, 4/300 nonterminals 8/600 grammar rules, 13/1000 states0 shift/reduce, 0 reduce/reduce conflicts reported8/350 working sets usedmemory: states,etc. 69/24000, parser 9/12000 9/600 distinct lookahead sets4 extra closures13 shift entries, 1 exceptions 7 goto entries3 entries saved by goto defaultOptimizer space used: input 38/24000, output 218/12000218 table entries, 205 zeromaximum spread: 257, maximum offset: 43
LandY.62
CSE
4100
Defining PrecedenceDefining Precedence
%token NUMBER
%left '+' '-' %left '*' '/'%right UMINUS
%%
expr : expr '+' expr {$$ = $1 + $3;}| expr '-' expr {$$ = $1 - $3;}| expr '*' expr {$$ = $1 * $3;}| expr '/' expr {$$ = $1 / $3;}| '(' expr ') {$$ = $2; }| '-‚expr %prec UMINUS {$$ = - $2; }| NUMBER;
{fact.val = expr.val}$$ = $2
| DIGIT
{fact.val = DIGIT.lexval}$$ = char_to_int(yytext)
Left associative andEqual precedence
UMINUS Highest precedence of all
LandY.63
CSE
4100
Automatic Ambiguity ResolutionAutomatic Ambiguity Resolution
�� Input Grammar May be AmbiguousInput Grammar May be Ambiguous
�� Bison (and others) have Default Disambiguating RulesBison (and others) have Default Disambiguating Rules
� In a Shift/Reduce Conflict, the Shift is Chosen
� In a Reduce/Reduce Conflict, the Reduction is to Reduce by “earlier” rule (listed from top-down)
�� Can’t Control S/R Conflict ResolutionCan’t Control S/R Conflict Resolution
�� However, for R/R ResolutionHowever, for R/R Resolution
� Reorder Rules to Force Different Shift
� Rewrite the Grammar to Remove Ambiguity
�� Other Error is:Other Error is:
� Rule Not Reduced
� If S/R Picks Shift, and Rule Never Reduced Elsewhere
LandY.64
CSE
4100
y.output as Generated by Bisony.output as Generated by Bison
State 3 contains 1 shift/reduce conflict.Grammarrule 1 statement -> if_then opt_elserule 2 statement -> assign_stmtrule 3 if_then -> T_IF rel_expr T_THEN statementrule 4 opt_else -> /* empty */rule 5 opt_else -> T_ELSE statementrule 6 assign_stmt -> T_IDENTIFIER T_ASSIGN valuerule 7 value -> TINTEGERrule 8 value -> TREALrule 9 value -> T STRINGrule 10 rel_expr -> compare rel_op comparerule 11 compare -> T_IDENTIFIERrule 12 compare -> valuerule 13 rel_op -> T_EQrule 14 rel_op -> T_LTrule 15 rel_op -> T_NErule 16 rel_op -> T_GErule 17 rel_op -> T_GT
LandY.65
CSE
4100
y.output as Generated by Bisony.output as Generated by Bison
Terminals, with rules where they appear$ (-1)error (256) T_IF (258) 3 T_THEN (259) 3 T_ELSE (260) 5T_IDENTIFIER (261) 6 11T_ASSIGN (262) 6 T_INTEGER (263) 7T_REAL (264) 8T_STRING (265) 9 T_EQ (266) 13 T_LT (267) 14 T_NE (268) 15 T_GE (269) 16 T_GT (270) 17
LandY.66
CSE
4100
y.output as Generated by Bisony.output as Generated by Bison
Nonterminals, with rules where they appearstatement (16)
on left: 1 2, on right: 3 5 if_then (17)
on left: 3, on right: 1 opt_else (18)
on left: 4 5, on right: 1 assign_stmt (19)
on left: 6, on right: 2 value (20)
on left: 7 8 9, on right: 6 12rel_expr (21)
on left: 10, on right: 3 compare (22)
on left: 11 12, on right: 10 rel_op (23)
on left: 13 14 15 16 17, on right: 10
LandY.67
CSE
4100
y.output as Generated by Bisony.output as Generated by Bison
state 0T_IF shift, and go to state 1T_IDENTIFIER shift, and go to state 2statement go to state 26 if_then go to state 3 assign_stmt go to state 4
state 1if_then-> T_IF . rel_expr T_THEN statement (rule 3)TIDENTIFIER shift, and go to state 5TINTEGER shift, and go to state 6T REAL shift, and go to state 7T_STRING shift, and go to state 8value go to state 9rel_expr go to state 10compare go to state 11
state 2assign_stmt -> T_IDENTIFIER . TASSIGN value (rule 6)T_ASSIGN shift, and go to state 12
LandY.68
CSE
4100
y.output as Generated by Bisony.output as Generated by Bison
state 3statement -> if_then . opt_else (rule 1)T_ELSE shift, and go to state 13T ELSE [reduce using rule 4 (opt_else)]$default reduce using rule 4 (opt_else)opt_else go to state 14
... etc ...
state 25rel_expr -> compare rel_op compare (rule 10)$default reduce using rule 10 (rel_expr)
state 26$ go to state 27
state 27$ go to state 28
state 28$default accept
LandY.69
CSE
4100
Hints for Writing Yacc SpecificationsHints for Writing Yacc Specifications
�� Use All Capital Letters for Token Names and All Use All Capital Letters for Token Names and All Lower Case for NonLower Case for Non--Terminals (Helps Debugging)Terminals (Helps Debugging)
�� Put Grammar Rules and Actions on Separate Lines Put Grammar Rules and Actions on Separate Lines (Makes Moving them Easier)(Makes Moving them Easier)
�� Put all Rules with Same Left Hand Side Together and Put all Rules with Same Left Hand Side Together and Utilize Veritical Bar for AlternativesUtilize Veritical Bar for Alternatives
�� Put a Semicolon After the Very Last Alternative for Put a Semicolon After the Very Last Alternative for Each Left Hand Side and on a Separate LineEach Left Hand Side and on a Separate Line
�� Yacc Encourages Left RecursionYacc Encourages Left Recursion
�� LALR Discourages Right Recursion!LALR Discourages Right Recursion!
LandY.70
CSE
4100
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
�� Two Tasks:Two Tasks:
�� Note that when I last gave this project, I put intentional Note that when I last gave this project, I put intentional errors in both latex.in and latex.l. I think I took them errors in both latex.in and latex.l. I think I took them all out of latex.l, but am not sure about latex.in. all out of latex.l, but am not sure about latex.in.
LandY.71
CSE
4100
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
�� Files on the Web Page:Files on the Web Page:latex.in : A sample input file.latex.l : A sample latex flex file.latexp2.y : Contains a bison specificationlatexp2clean.y: Equivalent specification no fprintfs.projp2.tex : This file - sample latex input.projp2.pdf : PDF version of Project, part 2.output : Generated file with parsing rules firedtypescript : Output of latex.l (tokens recognized)
b2conflicts.txt: S/R and R/R Conflicts from Bisonbison.debug.txt: Short Overview of Bison -v Outputlatexp2.output : Complete Bison -v Outputlatexp2.tab.c : Parser Generated by Bisonlex.yy.c : Lexical Analyzer Generated by Flex
LandY.72
CSE
4100
%{ /* THIS IS LATEX.L */
%}ws [ \t\n]+integer [0-9]+punc (\.|\,|\!|\?|\:|\;)word ({punc}|[a-zA-Z0-9])*special (\%|\_|\&|\$|\#)
%%
"\\\\" {printf(" %s \n", yytext);fflush(stdout); return(DBLBS);}"\\" {printf(" %s \n", yytext);fflush(stdout); return(BACKSL);}"{" {printf(" %s \n", yytext);fflush(stdout); return(LCURLYB);}"}" {printf(" %s \n", yytext);fflush(stdout); return(RCURLYB);}{special} {printf(" %s \n", yytext);fflush(stdout); return(SPECCHAR);}"[" {printf(" %s \n", yytext);fflush(stdout); return(LSQRB);}"]" {printf(" %s \n", yytext);fflush(stdout); return(RSQRB);}"alph" {printf(" %s \n", yytext);fflush(stdout); return(LALPH );}"Alph" {printf(" %s \n", yytext);fflush(stdout); return(CALPH);}"arabic" {printf(" %s \n", yytext);fflush(stdout); return(ARABIC);}"baselinestretch" {printf(" %s \n", yytext);fflush(stdout);return(BASELINES);}"begin" {printf(" %s \n", yytext);fflush(stdout); return(LBEGIN);}"caption" {printf(" %s \n", yytext);fflush(stdout); return(CAPTION);}"center" {printf(" %s \n", yytext);fflush(stdout); return(CENTER );}"document" {printf(" %s \n", yytext);fflush(stdout); return(DOCUMENT);}"end" {printf(" %s \n", yytext);fflush(stdout); return(END);}"enumerate" {printf(" %s \n", yytext);fflush(stdout);
return(ENUMERATE);}
Project Part 2 Latex.l Spec Project Part 2 Latex.l Spec –– Fall 2011Fall 2011
LandY.73
CSE
4100
"hspace" {printf(" %s \n", yytext);fflush(stdout); return(HSPACE);}"itemize" {printf(" %s \n", yytext);fflush(stdout); return(ITEMIZE);}"item" {printf(" %s \n", yytext);fflush(stdout); return(ITEM);}"it" {printf(" %s \n", yytext);fflush(stdout); return(IT);}"label" {printf(" %s \n", yytext);fflush(stdout); return(LABEL);}"noindent" {printf(" %s \n", yytext);fflush(stdout); return(NOINDENT);}"pagenumbering" {printf(" %s \n", yytext);fflush(stdout); return(PAGENUM);}"ref" {printf(" %s \n", yytext);fflush(stdout); return(REF);}"renewcommand" {printf(" %s \n", yytext);fflush(stdout); return(RENEW);}"roman" {printf(" %s \n", yytext);fflush(stdout); return(LROMAN);}"Roman" {printf(" %s \n", yytext);fflush(stdout); return(CROMAN);}"rm" {printf(" %s \n", yytext);fflush(stdout); return(RM);}"section" {printf(" %s \n", yytext);fflush(stdout); return(SECTION);}"single" {printf(" %s \n", yytext);fflush(stdout); return(SINGLE);}"subsection" {printf(" %s \n", yytext);fflush(stdout); return(SUBSEC);}"tableofcontents" {printf(" %s \n", yytext);fflush(stdout); return(TABOCON);}"table" {printf(" %s \n", yytext);fflush(stdout); return(TABLE);}"tablular" {printf(" %s \n", yytext);fflush(stdout); return(TABULAR);}"verbatim" {printf(" %s \n", yytext);fflush(stdout); return(VERBATIM);}"vspace" {printf(" %s \n", yytext);fflush(stdout); return(VSPACE);}"b" {printf(" %s \n", yytext);fflush(stdout); return(B);}"c" {printf(" %s \n", yytext);fflush(stdout); return(C);}"h" {printf(" %s \n", yytext);fflush(stdout); return(H);}"l" {printf(" %s \n", yytext);fflush(stdout); return(L);}"r" {printf(" %s \n", yytext);fflush(stdout); return(R);}"t" {printf(" %s \n", yytext);fflush(stdout); return(T);}{integer} {printf(" %s \n", yytext);fflush(stdout); return(INTEGER);}{word} {printf(" %s \n", yytext);fflush(stdout); return(WORD);}
{ws} { /* DO NOTHING */ }
Project Part 2 Latex.l Spec Project Part 2 Latex.l Spec –– Fall 2011Fall 2011
LandY.74
CSE
4100
%{ /* A VERSION OF YACC WITH FPRINTS FOR PROJECT, PART 2 */
#include <stdio.h>#include <ctype.h>
/* Define Global Vars */
FILE *fp;%}%start latexstatement
%token BACKSL LBEGIN LCURLYB DOCUMENT RCURLYB END%token WORD WSWORD SPECCHAR CENTER VERBATIM SINGLE %token ITEMIZE ENUMERATE TABULAR TABLE LSQRB RSQRB%token H T B R C L%token CAPTION LABEL DBLBS ITEM SECTION SUBSEC %token TABOCON RENEW BASELINES INTEGER PAGENUM ARABIC %token LROMAN CROMAN LALPH CALPH VSPACE HSPACE%token RM IT NOINDENT REF
%%latexstatement : startdoc mainbody enddoc{fprintf(fp,"after latexstatement\n");}
;
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
LandY.75
CSE
4100
startdoc : BACKSL LBEGIN LCURLYB DOCUMENT RCURLYB{fprintf(fp,"after startdoc\n");}
;
enddoc : BACKSL END LCURLYB DOCUMENT RCURLYB{fprintf(fp,"after enddoc\n");}
;
mainbody : mainbody mainoption{fprintf(fp,"after mainbody1\n");}
| mainoption{fprintf(fp,"after mainbody2\n");}
;
mainoption : textoption{fprintf(fp,"after mainoption1\n");}
| commentoption{fprintf(fp,"after mainoption2\n");}
| latexoptions{fprintf(fp,"after mainoption3\n");}
;
textoption : textoption WORD{fprintf(fp,"after textoption1\n");}
| WORD{fprintf(fp,"after textoption2\n");}
;
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
LandY.76
CSE
4100
/* LOTS OF STUFF MISSING */
fonts : RM {fprintf(fp,"after fonts1\n");}
| IT{fprintf(fp,"after fonts2\n");}
;
specialchar : SPECCHAR | LCURLYB | RCURLYB;
nonewpara : NOINDENT;
reference : REF LCURLYB WORD RCURLYB;
%%#include "lex.yy.c"
yyerror(){}
main(){fp = fopen("output","w");yyparse();}
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
LandY.77
CSE
4100
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
�� Remove Following from end of guiho.l FileRemove Following from end of guiho.l File
�� Building lex.yy.c and Compiling/Executing:Building lex.yy.c and Compiling/Executing:
� ssh to Linux
� flex latex.l
� bison –v latexp2.y
� gcc latexp2.tab.c –lfl
� a.out < latex.in
yywrapO{}
main() {int i; do {i = yylex();} while (i!=0);}
LandY.78
CSE
4100
Rules never reduced
35 beginblock: listblock56 optcaption: /* empty */58 optlabel: /* empty */
State 12 conflicts: 1 shift/reduceState 53 conflicts: 1 shift/reduceState 67 conflicts: 1 shift/reduceState 70 conflicts: 1 shift/reduceState 73 conflicts: 1 shift/reduceState 102 conflicts: 1 shift/reduce
Grammar0 $accept: latexstatement $end
1 latexstatement: startdoc mainbody enddoc
2 startdoc: BACKSL LBEGIN LCURLYB DOCUMENT RCURLYB
3 enddoc: BACKSL END LCURLYB DOCUMENT RCURLYB
4 mainbody: mainbody mainoption5 | mainoption
Missing Rules 89 nonewpara: NOINDENT90 reference: REF LCURLYB WORD RCURLYB
What Does Bison What Does Bison ––v Generate?v Generate?
LandY.79
CSE
4100
Terminals, with rules where they appear$end (0) 0error (256)BACKSL (258) 2 3 14 16 30 57 59 65 73LBEGIN (259) 2 29LCURLYB (260) 2 3 15 29 30 45 57 59 70 71 73 74 80 81 87 90
Missing Terminals
NOINDENT (302) 89REF (303) 90
Nonterminals, with rules where they appear$accept (49)
on left: 0latexstatement (50)
on left: 1, on right: 0startdoc (51)
on left: 2, on right: 1
Missing Nonterminals
specialchar (88)on left: 86 87 88, on right: 25
nonewpara (89)on left: 89, on right: 26
reference (90)on left: 90, on right: 27
What Does Bison What Does Bison ––v Generate?v Generate?
LandY.80
CSE
4100
state 00 $accept: . latexstatement $endBACKSL shift, and go to state 1latexstatement go to state 2startdoc go to state 3
state 12 startdoc: BACKSL . LBEGIN LCURLYB DOCUMENT RCURLYBLBEGIN shift, and go to state 4
state 2
0 $accept: latexstatement . $end$end shift, and go to state 5
state 31 latexstatement: startdoc . mainbody enddoc
BACKSL shift, and go to state 6LCURLYB shift, and go to state 7WORD shift, and go to state 8SPECCHAR shift, and go to state 9
mainbody go to state 10mainoption go to state 11textoption go to state 12commentoption go to state 13latexoptions go to state 14
These are the Item Sets!These are the Item Sets!
LandY.81
CSE
4100
State 12
6 mainoption: textoption .9 textoption: textoption . WORD
WORD shift, and go to state 57
WORD [reduce using rule 6 (mainoption)]$default reduce using rule 6 (mainoption)
state 53
9 textoption: textoption . WORD13 commentoption: SPECCHAR textoption .
WORD shift, and go to state 57
WORD [reduce using rule 13 (commentoption)]$default reduce using rule 13 (commentoption)
What are S/R Errors? (b2conflicts.txt)What are S/R Errors? (b2conflicts.txt)
LandY.82
CSE
4100
state 67
9 textoption: textoption . WORD32 beginblock: textoption .61 centerblock: textoption . DBLBS69 tableentry: textoption .
WORD shift, and go to state 57DBLBS shift, and go to state 97
SPECCHAR reduce using rule 69 (tableentry)DBLBS [reduce using rule 69 (tableentry)]$default reduce using rule 32 (beginblock)
state 70
28 beginendopts: beginoptions beginblock . endoptions
BACKSL shift, and go to state 99
BACKSL [reduce using rule 56 (optcaption)]
endoptions go to state 100endtableopts go to state 101optcaption go to state 102
What are S/R Errors? (b2conflicts.txt)What are S/R Errors? (b2conflicts.txt)
LandY.83
CSE
4100
state 73
35 beginblock: listblock .63 listblock: listblock . anitem
BACKSL shift, and go to state 65
BACKSL [reduce using rule 35 (beginblock)]
anitem go to state 104
state 102
55 endtableopts: optcaption . optlabel
BACKSL shift, and go to state 122
BACKSL [reduce using rule 58 (optlabel)]
optlabel go to state 123
What are S/R Errors? (b2conflicts.txt)What are S/R Errors? (b2conflicts.txt)
LandY.84
CSE
4100
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
�� What is Actually Occurring What is Actually Occurring –– see Part 2 Spec see Part 2 Spec
�� Three Rules Never Used:Three Rules Never Used:
� 35 beginblock: listblock
� 56 optcaption: /* empty */
� 58 optlabel: /* empty */
�� Certain Grammar Combos Can’t OccurCertain Grammar Combos Can’t Occur
�� Six Shift/Reduce ErrorsSix Shift/Reduce Errors
� Explore the Item Set and the Involved Grammar Rules
� Shift Always Picked – What is the Grammar Behavior (rule that is Fired) based on that?
� What are the Options to Fix the Problem?
LandY.85
CSE
4100
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
�� What’s the Solution?What’s the Solution?
� Need to Rework the Grammar so that All of the S/R Errors that Cause Problems and the Rules Not Reduced are Rectified
� Try Rewriting the Grammar Rules
� OK to Introduce S/R and R/R as Long as Program Still Parses and no Rules not Reduced
� This means – Does it Run on All Test Cases!
LandY.86
CSE
4100
THIS SHOWS WHERE THE TOKENS STOPPED BEING PROCESSED:Script started on Sun 23 Oct 2011 11:38:50 AM EDTsteve@icarus2:~/LP2# a.out < latex.in\begin { document } \pagenumbering { arabic } \arabic { 5
ETC.
to be centered \end steve@icarus2:~/LP2# exit
What Tools to We have What Tools to We have –– typescript Filetypescript File
LandY.87
CSE
4100
CREATED BY FPRINTS – SHOWS THE GRAMMAR RULES FIRED AND WHERE IT FAILEDafter startdocafter pagenumbersafter backsoptions5after latexoptions1after mainoption3after mainbody2after pagenuminitafter backsoptions6after latexoptions1after mainoption3after mainbody1after linespacingafter backsoptions4after latexoptions1after mainoption3after mainbody1after backsoptions3after latexoptions1after mainoption3after mainbody1after textoption2after sectionoptions1after backsoptions2after latexoptions1
What Tools to We have What Tools to We have –– output Fileoutput File
after mainoption3after mainbody1after textoption2after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1ETC
after textoption1after anitemafter listblock1after textoption2after textoption1after textoption1after textoption1after textoption1after textoption1after anitemafter listblock1after textoption2after textoption1after textoption1after textoption1after textoption1after textoption1after textoption1after anitemafter listblock1
FAILED AT THIS POINT IN CONJUNCTION WITH TOKEN FROM TYPESCRIPT
LandY.88
CSE
4100
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
�� Recall Two Tasks:Recall Two Tasks:
�� Task 1 Involves Fixing S/R and Rules Not Reduced Task 1 Involves Fixing S/R and Rules Not Reduced Errors Errors –– Generate Revised latexp2.yGenerate Revised latexp2.y
�� Task 2 Involves Separate Activity to Supported Nested Task 2 Involves Separate Activity to Supported Nested Blocks and Verbatim Blocks and Verbatim
� May Require Grammar and Perhaps flex Changes
� Need to Recognize “white” space for Verbatim
LandY.89
CSE
4100
Project Part 2 Bison Project Part 2 Bison –– Fall 2011Fall 2011
�� Hand in Requirements:Hand in Requirements:
� Log File for Grammar Changes to eliminate the shift/reduce errors and other problems for Task 1
� Track Original Grammar Segments and Revisions
� Hand in Revised Grammar for Task 1
� Log File for Grammar Changes to support Nested Blocks and Verbatim for for Task 2
� Track Original Grammar Segments and Revisions
� Hand in Revised Grammar for Task 2
� Test Cases for both Tasks (own Test Cases)
� Compilation Instructions if Different from Default
LandY.90
CSE
4100
Advice re. Project 2 Task 1Advice re. Project 2 Task 1
�� Need to Focus on Rules not ReducedNeed to Focus on Rules not Reduced
�� See Proj2Advice.doc that was EmailedSee Proj2Advice.doc that was Emailed
� We’ll Briefly Review
�� Note that out of the six S/R errors, two do not need to Note that out of the six S/R errors, two do not need to be fixedbe fixed
� For those two, need to Examine the State, the involved Rules, the Shift/Reduction Conflict
� The Reduction May not Occur in that State but if it Occurs in Another State – May be OK
� Test a Sample Input Associated with Grammar Rules that are Involved
LandY.91
CSE
4100
Optcaption and optlabel Optcaption and optlabel –– Grammar RulesGrammar Rules
beginoptions : LBEGIN LCURLYB begendcmds RCURLYB begtableopts;
endoptions : endtableopts BACKSL END LCURLYB begendcmds RCURLYB
begtableopts : LSQRB position RSQRB| LCURLYB tablespec RCURLYB| /* epsilon move */;
position : H | T | B;
tablespec : tablespec colspec| colspec;
colspec : R | C | L;
endtableopts : optcaption optlabel;
optcaption : /* epsilon move */| BACKSL CAPTION LCURLYB textoption RCURLYB;
optlabel : /* epsilon move */| BACKSL LABEL LCURLYB WORD RCURLYB
LandY.92
CSE
4100
Optcaption and optlabel Optcaption and optlabel –– Grammar RulesGrammar Rules
What are the four possibilities? 1. neither is present;2. optcaption only3. optlabel only4. both present
Can you rewrite the grammar rules above to precisely cover these four options more explicitly?
Can you alleviate the epsilon-epsilon possibility in endtableoptions?
You still want that option, but if you can get the other three non-empty options (2, 3, and 4) recognized then you will likely also be able to recognize the epsilon-epsilon case.
LandY.93
CSE
4100
Rule Not Reduced and State 73Rule Not Reduced and State 73
Rule not reduced:
35 beginblock: listblock
state 73
35 beginblock: listblock .63 listblock: listblock . anitem
BACKSL shift, and go to state 65
BACKSL [reduce using rule 35 (beginblock)]
anitem go to state 104
This always SHIFTS when seeing anitem that starts with a BACKSL!
State 65 us Processing the anitem Rule
LandY.94
CSE
4100
Rule Not Reduced and State 73Rule Not Reduced and State 73
\begin{document}\begin{itemize}\item Single is for Single spacing\item Hello again\end{itemize}\end{document}
\begin { document } \begin { itemize }
\item Single is for Single spacing \item Hello again \end
after startdocafter begendcmds4after begtableopts3after beginoptionsafter textoption2after textoption1after textoption1after textoption1after textoption1after anitemafter listblock2after textoption2after textoption1after anitemafter listblock1
Issue: Thinks \ will be Followed by item while Instead it is followed By end
LandY.95
CSE
4100
Revisiting First Example via Attr. GrammarsRevisiting First Example via Attr. Grammars
%{/*Includes and Global Variables here*/ #include <stdio.h>#include <ctype.h>%}%start line%token DIGIT%%/* Grammar Rules */line : expr '\n'
;
expr : expr '+' term| term
;
term : term '*' fact| fact
;
fact : '(' expr ')'| DIGIT
;%%
%%/* Define own yylex */yylex(){
int c;c = getchar();if (isdigit(c)) { yylval = c-'0';return DIGIT;
}return c;
}/* Error Routine */yyerror(){}
/* yyparse calls yylex */main(){yyparse();}
LandY.96
CSE
4100
How Do Grammar Rules Fire?How Do Grammar Rules Fire?
�� Just like Attribute Grammars! Input 5 + 3 * 8Just like Attribute Grammars! Input 5 + 3 * 8
line : expr '\n'
expr : expr '+' term| term
term : term '*' fact| fact
fact : '(' expr ')'| DIGIT
E ⇒⇒⇒⇒ E + T ⇒⇒⇒⇒ E + T * F⇒⇒⇒⇒ E + T * DIGIT⇒⇒⇒⇒ E + F * DIGIT⇒⇒⇒⇒ E + DIGIT * DIGIT⇒⇒⇒⇒ T + DIGIT * DIGIT⇒⇒⇒⇒ F + DIGIT * DIGIT⇒⇒⇒⇒ DIGIT + DIGIT * DIGIT
LandY.97
CSE
4100
Stack Performs RM Derivation in ReverseStack Performs RM Derivation in Reverse
DIGIT
E ⇒⇒⇒⇒ E + T ⇒⇒⇒⇒ E + T * F⇒⇒⇒⇒ E + T * DIGIT⇒⇒⇒⇒ E + F * DIGIT⇒⇒⇒⇒ E + DIGIT * DIGIT⇒⇒⇒⇒ T + DIGIT * DIGIT⇒⇒⇒⇒ F + DIGIT * DIGIT⇒⇒⇒⇒ DIGIT + DIGIT * DIGIT
F T E +E
DIGIT
+
E
F
+
E
T
+
E
*
T
+
E
DIGIT
*
T
+
E
F
*
T
+
E
T
+
E
E
LandY.98
CSE
4100
Corresponding Attribute GrammarCorresponding Attribute Grammar
line : expr {line.val = expr.val }
;
expr : expr1 '+' term
{expr.val = expr1.val + term.val}| term
{expr.val = term.val}
term : term1 '*' fact
{term.val = term1.val * fact.val}| fact
{term.val = fact.val}fact : '(' expr ')‚
{fact.val = expr.val}| DIGIT
{fact.val = DIGIT.lexval}
�� val is a synthesized attributeval is a synthesized attribute
LandY.99
CSE
4100
How Does this Transition into Bison?How Does this Transition into Bison?
�� Bison (in y.tab.c) Maintains UserBison (in y.tab.c) Maintains User--Accessible Parsing Accessible Parsing Stack Defined as:Stack Defined as:
#ifndef YYSTYPE#define YYSTYPE int#endifYYSTYPE yylval, yyval;YYSTYPE yyv[YYMAXDEPTH];
yyvConsider Grammar RuleS -> A B CEventually, A B C on Stack to be Replaced by S in Reduction
For that Rule, Offsets into Parsing Stack are Defined as:$1 = A, $2 = B, $3 = C
$3$2$1
LandY.100
CSE
4100
How Does this Transition into Yacc?How Does this Transition into Yacc?
yyvConsider Grammar RuleS -> A B C (all are nonterminals)Eventually, A B C on Stack to be Replaced by S in Reduction
For that Rule, Offsets into Parsing Stack are Defined as:$1 = A, $2 = B, $3 = C
$3$2$1
S : A {$1 = 5;}
B {$2 = 7;}
C{$3 = 9;$$ = $1 + $2 + $3;}
;
LandY.101
CSE
4100
Revisiting the Attribute GrammarRevisiting the Attribute Grammar
line : expr {line.val = expr.val }$$ = $1
expr : expr1 '+' term
{expr.val = expr1.val + term.val}$$ = $1 + $3
| term
{expr.val = term.val}$$ = $1
term : term1 '*' fact
{term.val = term1.val * fact.val}$$ = $1 * $3
| fact
{term.val = fact.val}$$ = $1
fact : '(' expr ')‚
{fact.val = expr.val}$$ = $2
| DIGIT
{fact.val = DIGIT.lexval}$$ = char_to_int(yytext)
LandY.102
CSE
4100
Interactions Between Lex and YaccInteractions Between Lex and Yacc
IN YACC:#ifndef YYSTYPE#define YYSTYPE int#endifYYSTYPE yylval, yyval;YYSTYPE yyv[YYMAXDEPTH];
yyv
S -> A B C$$ $1 $2 $3
$3$2$1
IN LEX: char yytext[YYLMAX];int yylength;
yytext: globally passes lexeme to parser
Yylval: Set in lexical analyzer
Returns Token value
What is place in stack yyv
LandY.103
CSE
4100
Pascal to C ConversionPascal to C Conversion
�� Utilize a Limited Subset of PascalUtilize a Limited Subset of Pascal
� If-Then-Else and Assignment Statements
� Relational (Boolean) Expressions and Operators
�� Conversions of Note:Conversions of Note:
� If-Then-Else goes to If-Else (no then in C)
� = Goes to ==
� < > Goes to !=
� := Goes to =
�� Key IssuesKey Issues
� Define String Variables to Hold Concatenated “Program” Bottom Up
� Construction Utilizes Current Lexeme (yytext) Concatenated with Appropriate Conversions
� Information Passes “Up” the Grammar
LandY.104
CSE
4100
Pascal to C ConversionPascal to C Conversion%{#include <stdio.h>#include <ctype.h>char strans[100], atrans[100], itrans[100], etrans[100],
vtrans[100], retrans[100], ctrans[100], rtrans[100];%}%start statement
%token T_IF T_THEN T_ELSE T_IDENTIFIER T_ASSIGN T_INTEGER T_REAL %token T_STRING T_EQ T_LT T_LE T_NE T_GE T_GT
%%statement : if_then opt_else
{strcpy(strans, itrans);strcat(strans, etrans);printf("%s\n", strans);}
| assign_stmt{strcat(strans, atrans);printf("%s\n", strans);};
if_then : T_IF rel_expr {strcpy(itrans, "if "); strcat(itrans, retrans);}
T_THEN assign_stmt{strcat(itrans, atrans);};
LandY.105
CSE
4100
Pascal to C ConversionPascal to C Conversionopt_else : /* the empty case */
{strcpy(etrans, "");}| T_ELSE assign_stmt {strcpy(etrans, " else ");strcat(etrans, atrans);};
assign_stmt : T_IDENTIFIER {strcpy(atrans, yytext);} T_ASSIGN {strcat(atrans, "=");}value {strcat(atrans, vtrans);};
value : T_INTEGER{strcpy(vtrans, yytext);}
| T_REAL{strcpy(vtrans, yytext);}
| T_STRING{strcpy(vtrans, yytext);};
rel_expr : compare {strcpy(retrans, ctrans);}rel_op {strcat(retrans, rtrans);}compare {strcat(retrans, ctrans);}
;
LandY.106
CSE
4100
Pascal to C ConversionPascal to C Conversioncompare : T_IDENTIFIER
{strcpy(ctrans, yytext);}| value{strcpy(ctrans, yytext);}
;rel_op : T_EQ
{strcpy(rtrans, "==");}| T_LT
{strcpy(rtrans, "<");}| T_LE
{strcpy(rtrans, "<=");}| T_NE
{strcpy(rtrans, "!=");}| T_GE
{strcpy(rtrans, ">=");}| T_GT
{strcpy(rtrans, ">");};
%%#include "lex.yy.c"yyerror(){}main(){yyparse();}
LandY.107
CSE
4100
What would Pascal to C Generate?What would Pascal to C Generate?/* SAMPLE INPUT ... */procedure MAIN is X, Y: INTEGER; A, B, C: FLOAT; D, E: CHARACTER;
beginif (X = Y) and (Z /= W)thenZ:= X;if (A <= B) then A := B; end if; X := X + 1;
elseY:=Y+1;
end if;
A :=B +C * D; A :=B * C / D;
end MAIN;
LandY.108
CSE
4100
What would Pascal to C Generate?What would Pascal to C Generate?/* AND OUTPUT */TYPE BEING CONVERTED TO: int TYPE BEING CONVERTED TO: float TYPE BEING CONVERTED TO: charassign_stmt*** Z = X ;assign stmt*** A = B ;if stmt***if ( A <= B{ A = B ; } assign stmt*** X = X + 1 ; assign_stmt*** Y = Y + 1 ;if stmt***if ( X == Y && Z != W{ Z =- X ;if ( A <= B{ A = B ; }X = X + 1 ;
} else{ Y = Y + 1; } assign_stmt*** A = B + C * D ;assign_stmt*** A = B * C / D ;
LandY.109
CSE
4100
Redefine Parsing StackRedefine Parsing Stack%{#include <stdio.h>#include <ctype.h>Typedef char *stype;#define YYSTYPE stype;char strans[100], atrans[100], itrans[100], etrans[100],
vtrans[100], retrans[100], ctrans[100], rtrans[100];%}. . . Etc . . . %%statement : if_then opt_else
{strcat(itrans, etrans);$$ = itrans;printf("%s\n", $$);}
| assign_stmt{$$ = atrans;printf("%s\n", $$);};
IN Y.TAB.C – REDEFINES CONTENTS OF PARSING STACK#ifndef YYSTYPE#define YYSTYPE int#endifYYSTYPE yylval, yyval;YYSTYPE yyv[YYMAXDEPTH];
LandY.110
CSE
4100
Utilizing Unions to Redefine Parsing StackUtilizing Unions to Redefine Parsing Stack
�� Unions Define Ability of Data Structure to be of Unions Define Ability of Data Structure to be of Multiple Types (one or other attribute active)Multiple Types (one or other attribute active)
�� Consider the C Union Definition:Consider the C Union Definition:
union EITHEROR /* Union Type Name */{char trans[100];int XYZ;
} EOR; /* Variable Name */
EOR.trans is a string (use strcpy, strcat, etc.)
EOR.XYZ is an int (use assignment, boolean expr, etc.)
Only trans or XYZ has a value but NOT both!
LandY.111
CSE
4100
Utilizing Unions to Redefine Parsing StackUtilizing Unions to Redefine Parsing Stack
%{#include <stdio.h>#include <ctype.h>%}%start statement
%union {char trans[100];int XYZ;
}
%token T_IF T_THEN T_ELSE T_IDENTIFIER %token T_STRING T_ASSIGN T_INTEGER T_REAL %token T_EQ T_LT T_LE T_NE T_GE T_GT
%type <trans> statement if_then opt_else %type <trans> assign_stmt value compare%type <trans> rel_op variable rel_expr
/* ALSO, types and tokens for XYZ are possible */%%
Union Definition
Redefines non-terminals of type <trans> to allow them to be that part of the union
LandY.112
CSE
4100
Utilizing Unions to Redefine Parsing StackUtilizing Unions to Redefine Parsing Stack
�� What Does the Parsing Stack now ContainWhat Does the Parsing Stack now Contain
IN YACC:YYSTYPE yylval, yyval;YYSTYPE yyv[YYMAXDEPTH];
yyv
S -> A B C$$ $1 $2 $3
$3$2$1
IN LEX: char yytext[YYLMAX];int yylength;
THIS EFFECTIVELY REPLACES YYSTYPE%union {char trans[100];int XYZ;
}
$$.trans$1.XYX$2.transEtc.
LandY.113
CSE
4100
Unions for Pascal to C ConversionUnions for Pascal to C Conversion
statement : if_then opt_else{strcpy($$, $1);strcat($$, $2);printf("%s\n", $$);}
| assign_stmt{strcpy($$, $1);printf("%s\n", $$);};
if_then : T_IF rel_expr T_THEN assign_stmt{strcpy($$, " if "); strcat($$, $2); strcat($$, $4);}
;
opt_else : /* the empty case */{strcpy($$, "");}
| T_ELSE assign_stmt {strcpy($$, " else ");strcat($$, $2);};
LandY.114
CSE
4100
Unions for Pascal to C ConversionUnions for Pascal to C Conversionassign_stmt : variable T_ASSIGN value
{strcpy($$, $1); strcat($$, " = "); strcat($$, $3);};
value : T_INTEGER{strcpy($$, yytext);}
| T_REAL{strcpy($$, yytext);}
| T_STRING{strcpy($$, yytext);};
rel_expr : compare rel_op compare {strcpy($$, $1); strcat($$, $2); strcat($$, $3);};
compare : T_IDENTIFIER {strcpy($$, yytext);}
| value{strcpy($$, yytext);}
LandY.115
CSE
4100
Unions for Pascal to C ConversionUnions for Pascal to C Conversionvariable : T_IDENTIFIER
{strcpy($$, yytext);};
rel_op : T_EQ {strcpy($$, " == ");}
| T_LT {strcpy($$, " < ");}
| T_LE {strcpy($$, " <= ");}
| T_NE {strcpy($$, " != ");}
| T_GE {strcpy($$, " >= ");}
| T_GT{strcpy($$, " > ");}
;%%#include "lex.yy.c"yyerror(){}yywrap(){}main(){yyparse();}
LandY.116
CSE
4100
Also Possible to Redefine TokensAlso Possible to Redefine Tokens
%{#include <stdio.h>#include <ctype.h>%}%start statement
%union {char trans[100];int XYZ;
}
%token T_IF T_THEN T_ELSE T_IDENTIFIER %token T_STRING T_ASSIGN T_INTEGER T_REAL %token T_EQ T_LT T_LE T_NE T_GE T_GT
%type <trans> T_IDENTIFIER T_ASSIGN etc . . .type <trans> statement if_then opt_else %type <trans> assign_stmt value compare%type <trans> rel_op variable rel_expr
/* ALSO, types and tokens for XYZ are possible */%%
LandY.117
CSE
4100
Also Possible to Redefine TokensAlso Possible to Redefine Tokens
assign_stmt : T_IDENTIFIER T_ASSIGN value {strcpy($$, $1); strcat($$, " = "); strcat($$, $3);};
value : T_INTEGER{strcpy($$, yytext);}
| T_REAL{strcpy($$, yytext);}
| T_STRING{strcpy($$, yytext);};
LandY.118
CSE
4100
Project Part 3 Bison Project Part 3 Bison –– Fall 2011Fall 2011
�� Using Bison for Syntax Directed TranslationUsing Bison for Syntax Directed Translation
�� Implementation of Attribute GrammarImplementation of Attribute Grammar
�� Given Input Latex File:Given Input Latex File:
� Basic Text Processing Capabilities
� Advanced Text Processing Capabilities
� Nested Blocks in Single Enviornment
� Full Blown Verbatim
� Type checking for
� Begin/End Blocks
� Combinations of Blocks
� Tabular Specification
� Documentation (written using your Latex Syntax Directed Translator and Document Generator)
LandY.119
CSE
4100
Files on Web PageFiles on Web Pagelatex.l : Common lexical analyzer specificationlatexp3c.y : Yacc file with nested blocks, WS, and verbatim
along with basic code generation
latexp3c.output : S/R and R/R Conflicts - Are all OK?generate.c : Basic routines for formatted text generationutil.c : Utility routines
latex.input.txt : Sample inputlatexout.txt : Generated output for sample (with errors!)latextoc.txt : Generated table of contents for sample
proj3gs.doc : Grading Sheet - place initials next to which parts each person on the team was primarily responsible for.
LandY.120
CSE
4100
Project Part 3 Bison Project Part 3 Bison –– Final Project ReqrmtsFinal Project Reqrmts
�� Your Revised latexp3c.y fileYour Revised latexp3c.y file
� You may have multiple versions for each of the major Document Generation Capabilities
�� Documentation of your Solution in Latex Using your Documentation of your Solution in Latex Using your Syntax Directed Translator/GeneratorSyntax Directed Translator/Generator
� Assumptions
� Log file with Major design decisions, problems, etc.
� Test Cases and Test Results (to be supplied)
� Zip File (lastnames.zip)
�� 42 Students 42 Students –– 21 Teams of 2!21 Teams of 2!
�� Email me your Teams by Nov 11Email me your Teams by Nov 11thth!!
LandY.121
CSE
4100
Project Part 3 Grading SheetProject Part 3 Grading SheetProject Part 3 Grading Sheet Student Name:Basic Text Processing Capabilities (35 points total) _____
Section/Subsection/Table of Contents (5 points) _____Line Spacing/Single-Double-Triple (5 points) _____Page Numbering/Styles (2.5 points) _____Vertical Spacing (2.5 points) _____Italics/Roman Fonts (2.5 points) _____Paragraphs/Noindent (2.5 points) _____Right Justification (10 points) _____Begin/End Single Blocks (5 points) __________ Testing - latex.tst file
Advanced Text Processing Capabilities (55 points total) ________ item.tst Itemize Blocks (5 points) ________ enum.tst Enumerate Blocks (5 points) ________ cent.tst Center Blocks (5 points) ________ verb.tst Verbatim Blocks (5 points) ________ tab.tst Tabular Blocks (10 points) ________ cent.tst Table Blocks with Refs/Captions (5 points) ________ sing.tst Relevant Combinations of Blocks (20 points) ________ nest.tst Single around Itemize/Enum/Center
All Combos of Itemize/EnumCenter around Tabular/Verbatim
LandY.122
CSE
4100
Project Part 3 Grading SheetProject Part 3 Grading SheetDocumentation, Log, Testing (10 points total) _____
- 5 points if documentation not Latex executable
- Testing of three project files
Testing: _____ projp1.tst _____ projp2.tst _____ projp3.tst
Nested Blocks within Single Environment (15 points total) _____Grammar Changes (5 points)
Implementation and Testing (10 points)
Testing: _____ nblocks.tst
Full Blown Verbatim (15 points total) _____
Grammar Changes (5 points)Implementation and Testing (10 points)
Testing: _____ fverbat.tst
Type/Error Checking (20 points total) _____
Begin/End Blocks - Matching (5 points) _____
Adv Begin/End Blocks - Limited Combos (5 points) _____Tabular Specifications Cols vs. Entries (5 points) _____
Testing: ____ tcbe.tst ____ tcabe.tst ____ tcts.tst
SUBTOTAL(150): _____
Standard Deductions (At most 10 points) _____
No Directory Location/Compilation Instr. (up to 5 points) _____
Lack of Comments (up to 5 points) _____Other (up to 5 points) _____
TOTAL(150): _____
LandY.123
CSE
4100
The flex File latex.lThe flex File latex.l/* THIS IS latex.l */%{ /* A LEX FOR PART 3 OF THE PROJECT WHERE VERBATIM WORKS */%}ws [ \t\n]+punc (\.|\,|\!|\?)word ({punc}|[a-zA-Z0-9])*special (\%|\_|\&|\$|\#)cols (r|l|c)*
%%"\\\\" {printf(" %s \n", yytext);return(DBLBS);}{special} {printf(" %s \n", yytext);return(SPECCHAR);}"[" {printf(" %s \n", yytext);return(LSQRB);}"]" {printf(" %s \n", yytext);return(RSQRB);}
"\\alph" {printf(" %s \n", yytext);return(LALPH1);}"{alph}" {printf(" %s \n", yytext);return(LALPH2);}"\\Alph" {printf(" %s \n", yytext);return(CALPH1);}"{Alph}" {printf(" %s \n", yytext);return(CALPH2);}"\\arabic" {printf(" %s \n", yytext);return(ARABIC1);}"{arabic}" {printf(" %s \n", yytext);return(ARABIC2);}"\\baselinestretch" {printf(" %s \n", yytext);return(BASELINES);}"\\begin" {printf(" %s \n", yytext);return(LBEGIN);}"\\caption" {printf(" %s \n", yytext);return(CAPTION);}"{center}" {printf(" %s \n", yytext);return(CENTER );}
LandY.124
CSE
4100
The flex File latex.lThe flex File latex.l"{document}" {printf(" %s \n", yytext);return(DOCUMENT);}"\\end" {printf(" %s \n", yytext);return(END);}"{enumerate}" {printf(" %s \n", yytext);return(ENUMERATE);}"\\hspace" {printf(" %s \n", yytext);return(HSPACE);}"{itemize}" {printf(" %s \n", yytext);return(ITEMIZE);}"\\item" {printf(" %s \n", yytext);return(ITEM);}"\\it" {printf(" %s \n", yytext);return(IT);}"\\label" {printf(" %s \n", yytext);return(LABEL);}"\\noindent" {printf(" %s \n", yytext);return(NOINDENT);}"\\pagenumbering" {printf(" %s \n", yytext);return(PAGENUM);}"\\ref" {printf(" %s \n", yytext);return(REF);}"\\renewcommand" {printf(" %s \n", yytext);return(RENEW);}"\\roman" {printf(" %s \n", yytext);return(LROMAN1);}"{roman}" {printf(" %s \n", yytext);return(LROMAN2);}"\\Roman" {printf(" %s \n", yytext);return(CROMAN1);}"{Roman}" {printf(" %s \n", yytext);return(CROMAN2);}"\\rm" {printf(" %s \n", yytext);return(RM);}"\\section" {printf(" %s \n", yytext);return(SECTION);}"{single}" {printf(" %s \n", yytext);return(SINGLE);}"\\subsection" {printf(" %s \n", yytext);return(SUBSEC);}"\\tableofcontents" {printf(" %s \n", yytext);return(TABOCON);}"{table}" {printf(" %s \n", yytext);return(TABLE);}"{tabular}" {printf(" %s \n", yytext);return(TABULAR);}"{verbatim}" {printf(" %s \n", yytext);return(VERBATIM);}"\\vspace" {printf(" %s \n", yytext);return(VSPACE);}
LandY.125
CSE
4100
The flex File latex.lThe flex File latex.l"b" {printf(" %s \n", yytext);return(B);}"h" {printf(" %s \n", yytext);return(H);}"t" {printf(" %s \n", yytext);return(T);}{cols} {printf(" %s \n", yytext);return(COLS);}"{" {printf(" %s \n", yytext);return(LCURLYB);}"}" {printf(" %s \n", yytext);return(RCURLYB);}
{word} {printf(" %s \n", yytext);return(WORD);}{ws} {printf("ws--%s--ws\n", yytext);
if ((strcmp(yytext, "\n\n") == 0) && (ws_flag == 0))return(WS);
else if (ws_flag == 1) return(WS);}
%%
LandY.126
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.y/* THIS IS latexp3code.y */%{ /* A YACC FOR PART 3 OF THE PROJECT WHERE VERBATIM AND NESTING WORKS */#include <stdio.h>#include <ctype.h>#include <string.h>#define BUF_SIZE 512
int ws_flag = 0;
#include "lex.yy.c"#include "util.c"#include "generate.c"
%}
%union {
char trans[BUF_SIZE+1];int val;
}
%start latexstatement
LandY.127
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.y%token BACKSL LBEGIN LCURLYB DOCUMENT RCURLYB END%token WORD WS SPECCHAR CENTER VERBATIM SINGLE %token ITEMIZE ENUMERATE TABULAR TABLE LSQRB RSQRB%token H T B COLS%token CAPTION LABEL DBLBS ITEM SECTION SUBSEC %token TABOCON RENEW BASELINES PAGENUM INTEGER ARABIC1 %token LROMAN1 CROMAN1 LALPH1 CALPH1 VSPACE HSPACE%token RM IT NOINDENT REF %token ARABIC2 LROMAN2 CROMAN2 LALPH2 CALPH2
%type <trans> textoption wsorword%type <val> style2 ARABIC2 LROMAN2 CROMAN2 LALPH2 CALPH2
%%
NOTE: YOU NEED TO ADD %type for ALL NON-TERMINALS and TOKENS that you wish to use the $$, $1, $2, etc. notation and the redefined parsing stack.
LandY.128
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.ylatexstatement : startdoc mainbody enddoc
;
startdoc : LBEGIN DOCUMENT ;
enddoc : END DOCUMENT ;
mainbody : mainbody mainoption| mainoption;
mainoption : textoption{generate_formatted_text($1);
}| commentoption| latexoptions;
LandY.129
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.ytextoption : textoption wsorword
{strcat($$, " ");strcat($$, $2);
}| wsorword
{strcpy($$, $1);
};
wsorword : WS {strcpy($$, yytext);
}| WORD
{strcpy($$, yytext);
};
commentoption : SPECCHAR textoption;
LandY.130
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.ylatexoptions : backsoptions
| LCURLYB curlyboptions RCURLYB;
curlyboptions : fonts textoption;
backsoptions : beginendopts| sectionoptions| tableofcont| linespacing| pagenumbers| pagenuminit| spacing| fonts| specialchar| nonewpara| reference;
beginendopts : LBEGIN begcmds beginblock endbegin ;
LandY.131
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.y
begcmds : CENTER | VERBATIM {ws_flag=1;}| SINGLE | ITEMIZE | ENUMERATE | TABLE begtableopts| TABULAR begtabularopts;
endbegin : END endcmds| endtableopts TABLE ;
endcmds : CENTER | VERBATIM {ws_flag=0;}| SINGLE | ITEMIZE | ENUMERATE | TABULAR;
LandY.132
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.ybeginblock : beginendopts
| textoption /* FOR single or verbatim */{printf("single or verb\n");}
| entrylist /* FOR center and tabular */{printf("center or tabular\n");}
| listblock /* FOR item and enumerate */{printf("item or enumerate\n");}
;
listblock : listblock anitem{printf("listblockA\n");}
| anitem{printf("listblockB\n");}
;
anitem : ITEM textoption| beginendopts;
entrylist : entrylist anentry{printf("entrylistA\n");}
| anentry{printf("entrylistB\n");}
;
LandY.133
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.yanentry : entry DBLBS
{printf("anentryA\n");}| beginendopts
{printf("anentryB\n");};
entry : entry SPECCHAR textoption{printf("entryA\n");}
| textoption{printf("entryB\n");}
;
begtableopts : LSQRB position RSQRB;
begtabularopts : LCURLYB COLS RCURLYB;
position : H | T | B;
LandY.134
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.yendtableopts : END
| CAPTION LCURLYB textoption RCURLYB captionrest
| labelrest ;
captionrest : END| labelrest;
labelrest : LABEL LCURLYB WORD RCURLYB END;
sectionoptions : SECTION LCURLYB textoption RCURLYB{generate_sec_header(get_sec_ctr(), $3);incr_sec_ctr();
}| SUBSEC LCURLYB textoption RCURLYB
{generate_subsec_header(get_sec_ctr(),
get_subsec_ctr(), $3);incr_subsec_ctr();
};
LandY.135
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.ytableofcont : TABOCON
{set_gen_toc();
};
linespacing : RENEW LCURLYB BASELINES RCURLYBLCURLYB WORD RCURLYB
;
pagenumbers : PAGENUM style2{set_page_style($2);
};
style2 : ARABIC2| LROMAN2 | CROMAN2 | LALPH2| CALPH2;
LandY.136
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.ypagenuminit : style1 LCURLYB WORD
{set_page_no(yytext[0]);
}RCURLYB
;
style1 : ARABIC1| LROMAN1 | CROMAN1 | LALPH1 | CALPH1;
spacing : horvert LCURLYB WORD RCURLYB;
horvert : VSPACE | HSPACE;
fonts : RM | IT;
LandY.137
CSE
4100
The Bison File latexp3c.yThe Bison File latexp3c.yspecialchar : SPECCHAR
| LCURLYB | RCURLYB;
nonewpara : NOINDENT;
reference : REF LCURLYB WORD RCURLYB;
%%
yyerror(){}
main(){fpout = fopen("latexout","w");fptoc = fopen("latextoc","w");init_lines_so_far();init_sec_ctr();init_output_page();
yyparse();}
LandY.138
CSE
4100
Latex.input.txtLatex.input.txt\begin{document}\pagenumbering{arabic}\arabic{5} \renewcommand{\baselinestretch}{2}\tableofcontents
\section{Introduction}
This is an example of text that would be transformed into a paragraph inlatex. Blank lines between text in the input cause a new paragraph to be generated.
When the blank line occurs after a section, no indentation of the paragraphis performed. However, all other blanks, would result in a five space indent of the paragraph.
\subsection{A Text Processor}
A text processor is a very useful tool, since it allows us todevelop formatted documents that are easy to read.
LandY.139
CSE
4100
Latex.input.txtLatex.input.txt
\subsection{Legal Latex Commands}
We have seen that there are many different Latex commands, that can be usedin many different ways. However, sometimes, we wish to use a character tomean itself, and override its Latex interpretation. For example, to usecurly braces, we employ the backslash a set of integers.
\section{Using Latex}
Finally, there are many other useful commands that involve begin/end blocks,that establish an environment. These blocks behave in a similar fashion tobegin/end blocks in a programming language, since they set a scope. Wehave discussed a number of examples.
It is important to note, even at this early stage, that lists may be createdwithin lists, allowing the nesting of blocks and environments.\end{document}a
LandY.140
CSE
4100
latexout.txtlatexout.txt
1 Introduction
This is an example of text that wouldbe transformed into a paragraph inlatex. Blank lines between text in theinput cause a new paragraph to begenerated. When the blank line occursafter a section, no indentation of theparagraph is performed. However, allother blanks, would result in a five
2.1 A Text Processor
A text processor is a very usefultool, since it allows us to developformatted documents that are easy to
LandY.141
CSE
4100
latexout.txtlatexout.txt2.2 Legal Latex Commands
We have seen that there are manydifferent Latex commands, that can beused in many different ways. However,sometimes, we wish to use a character tomean itself, and override its Latexinterpretation. For example, to usecurly braces, we employ the backslash a
2 Using Latex
Finally, there are many other useful commands that involve begin end blocks, that establish an environment. Theseblocks behave in a similar fashion tobegin end blocks in a programminglanguage, since they set a scope. Wehave discussed a number of examples.
It is important to note, even at thisearly stage, that lists may be createdwithin lists, allowing the nesting of
WHY DOESN’T PRINT IT ALL OUT?
LandY.142
CSE
4100
latextoc.txtlatextoc.txt
1 Introduction ---------- PAGE 5
2.1 A Text Processor ---------- PAGE 5
2.2 Legal Latex Commands ---------- PAGE 5
2 Using Latex ---------- PAGE 5
LandY.143
CSE
4100
The util.c FileThe util.c FileFILE *fpout;FILE *fptoc;
#define OUT_WIDTH 40#define SPACE_LEFT 5#define LINES_PER_PAGE 40#define TOC_ON 1
char line[OUT_WIDTH + 1];int lines_so_far;
void init_lines_so_far(){lines_so_far = 0;
}void incr_lines_so_far(){lines_so_far++;
}int check_done_page(){if (lines_so_far < LINES_PER_PAGE) return 1;else return 0;
}
LandY.144
CSE
4100
The util.c FileThe util.c File
struct doc_symtab {
int page_no_counter;int page_style;int line_spacing;int current_font;int generate_toc;int section_counter;int subsect_counter;
};
struct doc_symtab DST;
void init_sec_ctr(){DST.section_counter = 1;DST.subsect_counter = 1;
}
void incr_sec_ctr(){DST.section_counter++;DST.subsect_counter = 1;
}
LandY.145
CSE
4100
The util.c FileThe util.c File
void incr_subsec_ctr(){DST.subsect_counter++;
}
int get_sec_ctr(){return DST.section_counter;
}
int get_subsec_ctr(){return DST.subsect_counter;
}
int get_gen_toc(){return DST.generate_toc;
}
void set_gen_toc(){DST.generate_toc = 1;
}
LandY.146
CSE
4100
The util.c FileThe util.c File
void set_page_no(p)char p;{DST.page_no_counter = p - '0';
}
int get_page_no(p){return DST.page_no_counter;
}
int inc_page_no(){DST.page_no_counter++;return (DST.page_no_counter - 1);
}
void set_page_style(s)int s;{DST.page_style = s;
}
LandY.147
CSE
4100
The generate.c FileThe generate.c File
/* THIS IS THE generate.c FILE */
init_output_page(){fprintf(fpout, "\n\n\n\n\n");fflush(fpout);}
void generate_sec_header(i, s)int i;char *s;{fprintf(fpout, "\n\n%d %s\n", i, s);fflush(fpout);
if (get_gen_toc() == TOC_ON)fprintf(fptoc, "\n%d %s ---------- PAGE %d\n",
i, s, get_page_no());}
LandY.148
CSE
4100
The generate.c FileThe generate.c File
void generate_subsec_header(i, j, s)int i,j;char *s;{fprintf(fpout, "\n\n%d.%d %s\n", i, j, s);fflush(fpout);
if (get_gen_toc() == TOC_ON)fprintf(fptoc, "\n%d.%d %s ---------- PAGE %d\n",
i, j, s, get_page_no());}
LandY.149
CSE
4100
The generate.c FileThe generate.c File
void generate_formatted_text(s)char *s;{int slen = strlen(s);int i, j, k, r;int llen;
for (i = 0; i <= slen; ){for (j = 0; ((j < OUT_WIDTH) && (i <= slen)); i++, j++)
line[j] = s[i];
if (i <= slen){if ((line[j-1] != ' ') && (s[i] !=' '))
{for (k = j-1; line[k] != ' '; k--);
i = i - (j - k - 1);j = k;
}for ( ; s[i] == ' '; i++)
;}
LandY.150
CSE
4100
The generate.c FileThe generate.c File
line[j] = '\0';
llen = strlen(line);
if (i <= slen){fprintf(fpout, "\n%s", line);fflush(fpout);
}else{for(r = 0; r <= llen; r++)
s[r] = line[r]; /* includes backslash 0 */}
}}
LandY.151
CSE
4100
Using Structures in %unionUsing Structures in %union
#define BUF_SIZE 512
struct symtabtest{
int a, b;char c[BUF_SIZE];char d[BUF_SIZE];
};
%}%start latexstatement
%union {struct symtabtest st;int val;
}
%token ETC...%type <st> entrylist entry DBLBS listblock anitem %type <st> textoption wsorword WORD WS ITEM
%%ETC...
LandY.152
CSE
4100
Using Structures in %unionUsing Structures in %union
mainoption : textoption{ fprintf(fp, "%d %d %s %s\n",
$1.a, $1.b, $1.c, $1.d);}
| commentoption| latexoptions;
textoption : textoption wsorword{$$.a = 5;
}| wsorword
{$$.b = 10;
};
wsorword : WS { strcpy($$.c, yytext);}
| WORD{ strcpy($$.d, yytext);}
;
LandY.153
CSE
4100
Additional Lex/Yacc ExamplesAdditional Lex/Yacc Examples
�� Consider Ada9X (originally Ada95 and now Ada2005) Consider Ada9X (originally Ada95 and now Ada2005) is a Package Based, OO Programming Languageis a Package Based, OO Programming Language
�� Builds Upon the Original Ada Language Builds Upon the Original Ada Language
� Extension of Pascal
� Developed as a Language for DoD
�� Named After Ada Lovelace (1815Named After Ada Lovelace (1815--1852) 1852)
� Worked on Charles Babbage’s Early Mechanical Gerneral Purpose Computer/Analytical Engine
� The world’s “First Programmer”
� Wrote the world’s “First Computer Program on Bernoulli Numbers …
LandY.154
CSE
4100
Ada9X LexAda9X Lex
%{/******* A "lex"-style lexer for Ada 9X ****************************//* Copyright (C) Intermetrics, Inc. 1994 Cambridge, MA USA *//* Copying permitted if accompanied by this statement. *//* Derivative works are permitted if accompanied by this statement.*//* This lexer is known to be only approximately correct, but it is *//* more than adequate for most uses (the lexing of apostrophe is *//* not as sophisticated as it needs to be to be "perfect"). *//* As usual there is *no warranty* but we hope it is useful. *//*******************************************************************/
int error_count;%}
DIGIT [0-9]EXTENDED_DIGIT [0-9a-zA-Z]INTEGER ({DIGIT}(_?{DIGIT})*)EXPONENT ([eE](\+?|-){INTEGER})DECIMAL_LITERAL {INTEGER}(\.?{INTEGER})?{EXPONENT}?BASE {INTEGER}BASED_INTEGER {EXTENDED_DIGIT}(_?{EXTENDED_DIGIT})*BASED_LITERAL {BASE}#{BASED_INTEGER}(\.{BASED_INTEGER})?#{EXPONENT}?
LandY.155
CSE
4100
Ada9X LexAda9X Lex
%%"." return('.');"<" return('<');"(" return('(');"+" return('+');"|" return('|');"&" return('&');"*" return('*');")" return(')');";" return(';');"-" return('-');"/" return('/');"," return(',');">" return('>');":" return(':');"=" return('=');"'" return(TIC);".." return(DOT_DOT);"<<" return(LT_LT);"<>" return(BOX);"<=" return(LT_EQ);"**" return(EXPON);"/=" return(NE);">>" return(GT_GT);">=" return(GE);":=" return(IS_ASSIGNED);"=>" return(RIGHT_SHAFT);
LandY.156
CSE
4100
Ada9X LexAda9X Lex[a-zA-Z](_?[a-zA-Z0-9])* {
return(lk_keyword(yytext));}
"'"."'" return(char_lit);\"(\"\"|[^\n\"])*\" return(char_string);{DECIMAL_LITERAL} return(numeric_lit);{BASED_LITERAL} return(numeric_lit);--.*\n ;[ \t\n\f] ;. {fprintf(stderr, " Illegal character:%c: on line %d\n",
*yytext, yylineno);error_count++;}
%%/** Keywords stored in alpha order*/
typedef struct{char * kw;int kwv;} KEY_TABLE;
/* Reserved keyword list and Token values* as defined in y.tab.h*/
# define NUM_KEYWORDS 69
LandY.157
CSE
4100
Ada9X LexAda9X Lex
KEY_TABLE key_tab[NUM_KEYWORDS] = {{"ABSTRACT", ABSTRACT}, {"ACCEPT", ACCEPT}, {"ACCESS", ACCESS}, {"ALIASED", ALIASED}, {"ALL", ALL}, {"AND", AND},{"ARRAY", ARRAY}, {"AT", AT}, {"BEGIN", BEGiN}, {"BODY", BODY}, {"CASE", CASE}, {"CONSTANT", CONSTANT},{"DECLARE", DECLARE}, {"DELAY", DELAY}, {"DELTA", DELTA}, {"DIGITS", DIGITS}, {"DO", DO}, {"ELSE", ELSE},{"ELSIF", ELSIF}, {"END", END}, {"ENTRY", ENTRY}, {"EXCEPTION", EXCEPTION}, {"EXIT", EXIT}, {"FOR", FOR},{"FUNCTION", FUNCTION}, {"GENERIC", GENERIC}, {"GOTO", GOTO}, {"IF", IF}, {"IN", IN}, {"IS", IS},{"LIMITED", LIMITED}, {"LOOP", LOOP}, {"MOD", MOD}, {"NEW", NEW}, {"NOT", NOT}, {"NULL", NuLL},{"OF", OF}, {"OR", OR}, {"OTHERS", OTHERS}, {"OUT", OUT}, {"PACKAGE", PACKAGE}, {"PRAGMA", PRAGMA},{"PRIVATE", PRIVATE}, {"PROCEDURE", PROCEDURE}, {"PROTECTED", PROTECTED}, {"RAISE", RAISE}, {"RANGE", RANGE}, {"RECORD", RECORD},{"REM", REM}, {"RENAMES", RENAMES}, {"REQUEUE", REQUEUE}, {"RETURN", RETURN}, {"REVERSE", REVERSE}, {"SELECT", SELECT},{"SEPARATE", SEPARATE}, {"SUBTYPE", SUBTYPE}, {"TAGGED", TAGGED}, {"TASK", TASK}, {"TERMINATE", TERMINATE}, {"THEN", THEN},{"TYPE", TYPE}, {"UNTIL", UNTIL}, {"USE", USE}, {"WHEN", WHEN}, {"WHILE", WHILE}, {"WITH", WITH},{"XOR", XOR}
};
LandY.158
CSE
4100
Ada9X LexAda9X Lex
to_upper(str)char *str;
{char * cp;for (cp=str; *cp; cp++) {
if (islower(*cp)) *cp -= ('a' - 'A') ;}
}
lk_keyword(str)char *str;
{int min; int max;int guess, compare;min = 0;max = NUM_KEYWORDS-1;guess = (min + max) / 2;to_upper(str);
for (guess=(min+max)/2; min<=max; guess=(min+max)/2) {if ((compare = strcmp(key_tab[guess].kw, str)) < 0) {
min = guess + 1;} else if (compare > 0) {
max = guess - 1;} else {return key_tab[guess].kwv;}
}return identifier;
}
LandY.159
CSE
4100
Ada9X LexAda9X Lex
yyerror(s)char *s;
{extern int yychar;
error_count++;
fprintf(stderr," %s", s);if (yylineno)
fprintf(stderr,", on line %d,", yylineno);fprintf(stderr," on input: ");if (yychar >= 0400) {
if ((yychar >= ABORT) && (yychar <= XOR)) {fprintf(stderr, "(token) %s #%d\n",
key_tab[yychar-ABORT].kw, yychar);} else switch (yychar) {
case char_lit : fprintf(stderr, "character literal\n");break;
case identifier : fprintf(stderr, "identifier\n");break;
case char_string : fprintf(stderr, "string\n");break;
case numeric_lit : fprintf(stderr, "numeric literal\n");break;
case TIC : fprintf(stderr, "single-quote\n");break;
case DOT_DOT : fprintf(stderr, "..\n");break;
LandY.160
CSE
4100
Ada9X LexAda9X Lexcase LT_LT : fprintf(stderr, "<<\n");
break;case BOX : fprintf(stderr, "<>\n");
break;case LT_EQ : fprintf(stderr, "<=\n");
break;case EXPON : fprintf(stderr, "**\n");
break;case NE : fprintf(stderr, "/=\n");
break;case GT_GT : fprintf(stderr, ">>\n");
break;case GE : fprintf(stderr, ">=\n");
break;case IS_ASSIGNED : fprintf(stderr, ":=\n");
break;case RIGHT_SHAFT : fprintf(stderr, "=>\n");
break;default :
fprintf(stderr, "(token) %d\n", yychar);}
} else {switch (yychar) {case '\t': fprintf(stderr,"horizontal-tab\n"); return;case '\n': fprintf(stderr,"newline\n"); return;case '\0': fprintf(stderr,"\$end\n"); return;case ' ': fprintf(stderr, "(blank)"); return;default : fprintf(stderr,"(char) %c\n", yychar); return;}
}
LandY.161
CSE
4100
Ada9X YaccAda9X Yacc/******* A YACC grammar for Ada 9X *********************************//* Copyright (C) Intermetrics, Inc. 1994 Cambridge, MA USA *//* Copying permitted if accompanied by this statement. *//* Derivative works are permitted if accompanied by this statement.*//* This grammar is thought to be correct as of May 1, 1994 *//* but as usual there is *no warranty* to that effect. *//*******************************************************************/%{#include <stdio.h>#include <ctype.h>#include <strings.h>#define BUF_SIZE 512%}
%union { char trans[BUF_SIZE+1];int val; }
%token TIC DOT_DOT LT_LT BOX LT_EQ EXPON NE GT_GT%token GE IS_ASSIGNED RIGHT_SHAFT ABORT ABS ABSTRACT ACCEPT%token ACCESS ALIASED ALL AND ARRAY AT BEGiN BODY%token CASE CONSTANT DECLARE DELAY DELTA DIGITS DO%token ELSE ELSIF END ENTRY EXCEPTION EXIT FOR FUNCTION%token GENERIC GOTO IF IN IS LIMITED LOOP MOD%token NEW NOT NuLL OF OR OTHERS OUT PACKAGE%token PRAGMA PRIVATE PROCEDURE PROTECTED RAISE RANGE RECORD REM%token RENAMES REQUEUE RETURN REVERSE SELECT SEPARATE SUBTYPE%token TAGGED TASK TERMINATE THEN TYPE UNTIL%token USE WHEN WHILE WITH XOR char_lit identifier char_string numeric_lit
LandY.162
CSE
4100
Ada9X YaccAda9X Yacc
%type <trans> access_opt access_type adding address_spec %type <trans> aliased_opt align_opt allocator alternative %type <trans> alternative_s array_type assign_stmt attrib_def %type <trans> attribute_id basic_loop block block_body %type <trans> block_decl body body_opt body_stub %type <trans> c_id_opt c_name_list case_hdr case_stmt %type <trans> choice choice_s code_stmt comp_assoc %type <trans> comp_decl comp_decl_s comp_list comp_loc_s %type <trans> comp_unit compilation component_subtype_def compound_name %type <trans> compound_stmt cond_clause cond_clause_s cond_part %type <trans> condition constr_array_type context_spec decl %type <trans> decl_item decl_item_or_body decl_item_or_body_s1 decl_item_s %type <trans> decl_item_s1 decl_part def_id def_id_s %type <trans> derived_type designator discrete_range discrete_with_range %type <trans> discrim_part discrim_part_opt discrim_spec discrim_spec_s %type <trans> else_opt exit_stmt expression factor %type <trans> fixed_type float_type formal_part formal_part_opt generic_decl%type <trans> generic_derived_type generic_discrim_part_opt generic_formal %type <trans> generic_formal_part generic_inst generic_pkg_inst generic_subp_inst %type <trans> generic_type_def goal_symbol goto_stmt id_opt %type <trans> if_stmt index index_s init_opt %type <trans> integer_type iter_discrete_range_s iter_index_constraint iter_part %type <trans> iteration label label_opt limited_opt %type <trans> literal logical loop_stmt mark %type <trans> mode multiplying name name_opt %type <trans> name_s null_stmt number_decl object_decl
LandY.163
CSE
4100
Ada9X YaccAda9X Yacc%type <trans> object_qualifier_opt object_subtype_def param param_s %type <trans> paren_expression pkg_body pkg_decl pkg_spec %type <trans> primary private_opt private_part private_type %type <trans> procedure_call prot_body prot_decl prot_def %type <trans> prot_elem_decl prot_elem_decl_s prot_op_body prot_op_body_s %type <trans> prot_op_decl prot_op_decl_s prot_opt prot_private_opt %type <trans> prot_spec qualified range range_constr_opt %type <trans> range_constraint range_spec range_spec_opt real_type %type <trans> record_def record_type record_type_spec relation %type <trans> relational rep_spec return_stmt reverse_opt %type <trans> short_circuit simple_expression simple_stmt statement %type <trans> statement_s subp_default subprog_body subprog_decl %type <trans> subprog_spec subprog_spec_is_push subunit subunit_body %type <trans> tagged_opt term type_completion type_decl %type <trans> type_def unary unconstr_array_type unit %type <trans> unlabeled use_clause use_clause_opt value %type <trans> value_s value_s_2 variant variant_part %type <trans> variant_s when_opt with_clause my_identifier%type <trans> error epsilon ALIASED CONSTANT IS_ASSIGNED%type <trans> TYPE IS '(' NEW ABSTRACT RANGE MOD DIGITS DELTA NOT%type <trans> ARRAY ACCESS CASE WHEN OTHERS NuLL TAGGED RECORD%type <trans> PROTECTED AND OR my_char_lit%type <trans> '=' NE '<' LT_EQ '>' GE '+' '-' '*' '/' ':'%type <trans> LT_LT IF ELSE CASE WHEN WHILE FOR REVERSE LOOP%type <trans> DECLARE BEGiN EXIT RETURN GOTO PROCEDURE FUNCTION%type <trans> IN OUT PACKAGE PRIVATE LIMITED USE WITH SEPARATE%type <trans> GENERIC FOR AT my_char_string my_numeric_lit
LandY.164
CSE
4100
Ada9X YaccAda9X Yacc
%%goal_symbol : compilation
;decl : object_decl
| number_decl| type_decl| subprog_decl| pkg_decl| prot_decl| generic_decl| body_stub| error ';';
object_decl : def_id_s ':' object_qualifier_opt object_subtype_def init_opt ';';
def_id_s : def_id| def_id_s ',' def_id;
def_id : my_identifier{strcpy($$, $1);}
;object_qualifier_opt : epsilon
| ALIASED| CONSTANT| ALIASED CONSTANT;
LandY.165
CSE
4100
Ada9X YaccAda9X Yacc
object_subtype_def : name| array_type;
init_opt : epsilon| IS_ASSIGNED expression;
number_decl : def_id_s ':' CONSTANT IS_ASSIGNED expression ';';
type_decl : TYPE my_identifier discrim_part_opt type_completion ';';
discrim_part_opt : epsilon| discrim_part| '(' BOX ')';
type_completion : epsilon| IS type_def;
type_def : integer_type| real_type| array_type| record_type| access_type| derived_type| private_type;
ETC – See Full Yacc on web page…
LandY.166
CSE
4100
Ada9X YaccAda9X Yacc
REMAINING NON GRAMMAR CODE AT END OF YACC FILE
%%mystrcat(s, t)char s[], t[];{int i, j;i = j = 0;
while (s[i] != '\0') i++;s[i] = ' ';i++;
while ((s[i++] = t[j++]) != '\0');
}
LandY.167
CSE
4100
Ada9X YaccAda9X Yacc
/* To build this, run it through lex, compile it, and link it with *//* the result of yacc'ing and cc'ing grammar9x.y, plus "-ly" */
FILE *fp;
#include "lex.yy.c"
main(argc, argv)int argc;char *argv[];
{/* Simple Ada 9X syntax checker *//* Checks standard input if no arguments *//* Checks files if one or more arguments */
extern int error_count;extern int yyparse();extern int yylineno;FILE *flptr;int i;
fp = fopen("output","w");
LandY.168
CSE
4100
Ada9X YaccAda9X Yacc
if (argc == 1) {yyparse();
} else {for (i = 1; i < argc; i++) {
if ((flptr = freopen(argv[i], "r",stdin)) == NULL) {fprintf(stderr, "%s: Can't open %s", argv[0], argv[i]);
} else {if (argc > 2) fprintf(stderr, "%s:\n", argv[i]);yylineno = 1;yyparse();
}}
}if (error_count) {
fprintf(stderr, "%d syntax error%s detected\n", error_count,error_count == 1? "": "s");
exit(-1);} else {
fprintf(stderr, "No syntax errors detected\n");}
}yywrap() {return 1;}