+ All Categories
Home > Documents > Parsing in Perlnereida.deioc.ull.es/~pl/pspdf/otros/YE06-Parsers.pdfparser attempts to locate the...

Parsing in Perlnereida.deioc.ull.es/~pl/pspdf/otros/YE06-Parsers.pdfparser attempts to locate the...

Date post: 17-May-2018
Category:
Upload: truongnhan
View: 216 times
Download: 1 times
Share this document with a friend
111
Parsing in Perl Alberto Sim˜ oes [email protected] YAPC::EU::2006 Alberto Sim˜ oes Parsing in Perl
Transcript

Parsing in Perl

Alberto [email protected]

YAPC::EU::2006

Alberto Simoes Parsing in Perl

What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simoes Parsing in Perl

What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simoes Parsing in Perl

What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simoes Parsing in Perl

What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simoes Parsing in Perl

What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simoes Parsing in Perl

What we will talk about

Parsing...

what it is...

the tools to make it!

But not how to do it!

show some examples...

and compare their efficiency.

Alberto Simoes Parsing in Perl

The Definitions

Alberto Simoes Parsing in Perl

Parsing

In computer science, parsing is the process of analyzing an inputsequence (read from a file or a keyboard, for example) in order todetermine its grammatical structure with respect to a given formalgrammar. It is formally named syntax analysis. A parser is acomputer program that carries out this task. The name isanalogous with the usage in grammar and linguistics.

Parsing transforms input text into a data structure, usually a tree,which is suitable for later processing and which captures theimplied hierarchy of the input. Generally, parsers operate in twostages, first identifying the meaningful tokens in the input, andthen building a parse tree from those tokens.

Wikipedia (August 2006)

Alberto Simoes Parsing in Perl

Parsing

In computer science, parsing is the process of analyzing an inputsequence (read from a file or a keyboard, for example) in order todetermine its grammatical structure with respect to a given formalgrammar. It is formally named syntax analysis. A parser is acomputer program that carries out this task. The name isanalogous with the usage in grammar and linguistics.

Parsing transforms input text into a data structure, usually a tree,which is suitable for later processing and which captures theimplied hierarchy of the input. Generally, parsers operate in twostages, first identifying the meaningful tokens in the input, andthen building a parse tree from those tokens.

Wikipedia (August 2006)

Alberto Simoes Parsing in Perl

The Process

Lexical analysis is the processing of an input sequence ofcharacters (such as the source code of a computer program)to produce, as output, a sequence of symbols called“lexicaltokens”, or just “tokens”. For example, lexers for manyprogramming languages convert the character sequence 123abc into two tokens: 123 and abc (whitespace is not a tokenin most languages). The purpose of producing these tokens isusually to forward them as input to another program, such asa parser.

Syntax analysis is a process in compilers that recognizes thestructure of programming languages. It is also known asparsing.

Wikipedia (August 2006)

Alberto Simoes Parsing in Perl

The Process

Lexical analysis is the processing of an input sequence ofcharacters (such as the source code of a computer program)to produce, as output, a sequence of symbols called“lexicaltokens”, or just “tokens”. For example, lexers for manyprogramming languages convert the character sequence 123abc into two tokens: 123 and abc (whitespace is not a tokenin most languages). The purpose of producing these tokens isusually to forward them as input to another program, such asa parser.

Syntax analysis is a process in compilers that recognizes thestructure of programming languages. It is also known asparsing.

Wikipedia (August 2006)

Alberto Simoes Parsing in Perl

Approaches

Top-down parsing - A parser can start with the start symboland try to transform it to the input. Intuitively, the parserstarts from the largest elements and breaks them down intoincrementally smaller parts. LL parsers are examples oftop-down parsers.

Bottom-up parsing - A parser can start with the input andattempt to rewrite it to the start symbol. Intuitively, theparser attempts to locate the most basic elements, then theelements containing these, and so on. LR parsers are examplesof bottom-up parsers. Another term used for this type ofparser is Shift-Reduce parsing

Wikipedia (August 2006)

Alberto Simoes Parsing in Perl

Approaches

Top-down parsing - A parser can start with the start symboland try to transform it to the input. Intuitively, the parserstarts from the largest elements and breaks them down intoincrementally smaller parts. LL parsers are examples oftop-down parsers.

Bottom-up parsing - A parser can start with the input andattempt to rewrite it to the start symbol. Intuitively, theparser attempts to locate the most basic elements, then theelements containing these, and so on. LR parsers are examplesof bottom-up parsers. Another term used for this type ofparser is Shift-Reduce parsing

Wikipedia (August 2006)

Alberto Simoes Parsing in Perl

...boring...

Forget Wikipedia!

Alberto Simoes Parsing in Perl

What is Parsing?

to recognize portions of text:

detect tokens;

integers, reals, strings, variables, reserved words, etc.

analyze a specific token sequence:

detect syntax;

define the order tokens make sense;

interpret the sequence and perform an action:

perform semantic actions;

execute the code defined; generate code;

Alberto Simoes Parsing in Perl

What is Parsing?

to recognize portions of text:

detect tokens;

integers, reals, strings, variables, reserved words, etc.

analyze a specific token sequence:

detect syntax;

define the order tokens make sense;

interpret the sequence and perform an action:

perform semantic actions;

execute the code defined; generate code;

Alberto Simoes Parsing in Perl

What is Parsing?

to recognize portions of text:

detect tokens;

integers, reals, strings, variables, reserved words, etc.

analyze a specific token sequence:

detect syntax;

define the order tokens make sense;

interpret the sequence and perform an action:

perform semantic actions;

execute the code defined; generate code;

Alberto Simoes Parsing in Perl

What is Parsing?

to recognize portions of text:

detect tokens;

integers, reals, strings, variables, reserved words, etc.

analyze a specific token sequence:

detect syntax;

define the order tokens make sense;

interpret the sequence and perform an action:

perform semantic actions;

execute the code defined; generate code;

Alberto Simoes Parsing in Perl

What is Parsing?

to recognize portions of text:

detect tokens;

integers, reals, strings, variables, reserved words, etc.

analyze a specific token sequence:

detect syntax;

define the order tokens make sense;

interpret the sequence and perform an action:

perform semantic actions;

execute the code defined; generate code;

Alberto Simoes Parsing in Perl

What is Parsing?

to recognize portions of text:

detect tokens;

integers, reals, strings, variables, reserved words, etc.

analyze a specific token sequence:

detect syntax;

define the order tokens make sense;

interpret the sequence and perform an action:

perform semantic actions;

execute the code defined; generate code;

Alberto Simoes Parsing in Perl

What is Parsing?

to recognize portions of text:

detect tokens;

integers, reals, strings, variables, reserved words, etc.

analyze a specific token sequence:

detect syntax;

define the order tokens make sense;

interpret the sequence and perform an action:

perform semantic actions;

execute the code defined; generate code;

Alberto Simoes Parsing in Perl

What is Parsing?

to recognize portions of text:

detect tokens;

integers, reals, strings, variables, reserved words, etc.

analyze a specific token sequence:

detect syntax;

define the order tokens make sense;

interpret the sequence and perform an action:

perform semantic actions;

execute the code defined; generate code;

Alberto Simoes Parsing in Perl

What is Parsing?

to recognize portions of text:

detect tokens;

integers, reals, strings, variables, reserved words, etc.

analyze a specific token sequence:

detect syntax;

define the order tokens make sense;

interpret the sequence and perform an action:

perform semantic actions;

execute the code defined; generate code;

Alberto Simoes Parsing in Perl

So, Regular Expressions?

yes!

RegExp are good for tokens;RegExps are good for regular expressions :-)

no!

most real grammars can’t be parsed with RegExps;

Alberto Simoes Parsing in Perl

So, Regular Expressions?

yes!

RegExp are good for tokens;RegExps are good for regular expressions :-)

no!

most real grammars can’t be parsed with RegExps;

Alberto Simoes Parsing in Perl

So, Regular Expressions?

yes!

RegExp are good for tokens;RegExps are good for regular expressions :-)

no!

most real grammars can’t be parsed with RegExps;

Alberto Simoes Parsing in Perl

So, Regular Expressions?

yes!

RegExp are good for tokens;RegExps are good for regular expressions :-)

no!

most real grammars can’t be parsed with RegExps;

Alberto Simoes Parsing in Perl

So, Regular Expressions?

yes!

RegExp are good for tokens;RegExps are good for regular expressions :-)

no!

most real grammars can’t be parsed with RegExps;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

Then?

Typically:

flex for lexical analysis(re2c for thread-safe and reentrancy);bison for syntactic analysis(lemon for thread-safe and reentrancy);

but that is for C;

Perl 5 has lexical analysis (RegExps);

Perl 5 doesn’t have Grammar Support;

but we have CPAN!;Parse::RecDescent;Parse::Yapp;Parse::YALALR;

Perl 6 will have Grammar Support (Hurray!)

PGE — Parrot Grammar Engine;

Alberto Simoes Parsing in Perl

What I’ve tested

flex + bison;

re2c + lemon;

Parse::RecDescent;

Parse::YAPP;

flex + Parse::YAPP;

Parrot Grammar Engine

flex+bison and re2c+lemon will appear just at the end, as abaseline of efficiency.

Alberto Simoes Parsing in Perl

What I’ve tested

flex + bison;

re2c + lemon;

Parse::RecDescent;

Parse::YAPP;

flex + Parse::YAPP;

Parrot Grammar Engine

flex+bison and re2c+lemon will appear just at the end, as abaseline of efficiency.

Alberto Simoes Parsing in Perl

What I’ve tested

flex + bison;

re2c + lemon;

Parse::RecDescent;

Parse::YAPP;

flex + Parse::YAPP;

Parrot Grammar Engine

flex+bison and re2c+lemon will appear just at the end, as abaseline of efficiency.

Alberto Simoes Parsing in Perl

What I’ve tested

flex + bison;

re2c + lemon;

Parse::RecDescent;

Parse::YAPP;

flex + Parse::YAPP;

Parrot Grammar Engine

flex+bison and re2c+lemon will appear just at the end, as abaseline of efficiency.

Alberto Simoes Parsing in Perl

What I’ve tested

flex + bison;

re2c + lemon;

Parse::RecDescent;

Parse::YAPP;

flex + Parse::YAPP;

Parrot Grammar Engine

flex+bison and re2c+lemon will appear just at the end, as abaseline of efficiency.

Alberto Simoes Parsing in Perl

What I’ve tested

flex + bison;

re2c + lemon;

Parse::RecDescent;

Parse::YAPP;

flex + Parse::YAPP;

Parrot Grammar Engine

flex+bison and re2c+lemon will appear just at the end, as abaseline of efficiency.

Alberto Simoes Parsing in Perl

What I’ve tested

flex + bison;

re2c + lemon;

Parse::RecDescent;

Parse::YAPP;

flex + Parse::YAPP;

Parrot Grammar Engine

flex+bison and re2c+lemon will appear just at the end, as abaseline of efficiency.

Alberto Simoes Parsing in Perl

My Test Case (1/2)

a simple calculator;sums, subtractions, variables, prints;BNF:

Program ← Statement Program

Statement

Statement ← Variable ′ =′ Expression ′;′

′print ′ Expression ′;′

Expression ← Expression ′ −′ Expression

Expression ′ +′ Expression

Variable

Number

Number ← /\d + /

Variable ← /[a− z] + /

Alberto Simoes Parsing in Perl

My Test Case (1/2)

a simple calculator;sums, subtractions, variables, prints;BNF:

Program ← Statement Program

Statement

Statement ← Variable ′ =′ Expression ′;′

′print ′ Expression ′;′

Expression ← Expression ′ −′ Expression

Expression ′ +′ Expression

Variable

Number

Number ← /\d + /

Variable ← /[a− z] + /

Alberto Simoes Parsing in Perl

My Test Case (1/2)

a simple calculator;sums, subtractions, variables, prints;BNF:

Program ← Statement Program

Statement

Statement ← Variable ′ =′ Expression ′;′

′print ′ Expression ′;′

Expression ← Expression ′ −′ Expression

Expression ′ +′ Expression

Variable

Number

Number ← /\d + /

Variable ← /[a− z] + /

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

My Test Case (2/2)

automatic test generation;

randomly add, subtract and define variables;

randomly print variables;

example:a = 10;a = 150 - a + 350;print a;

different test sizes:10 lines;100 lines;1 000 lines;10 000 lines;100 000 lines;1 000 000 lines;2 000 000 lines;4 000 000 lines;6 000 000 lines;

Alberto Simoes Parsing in Perl

Now, the results

Alberto Simoes Parsing in Perl

Parse::RecDescent ID

Author: Damian Conway

Lastest Release: 1.94 (April 9, 2003)

Available from: CPAN

Alberto Simoes Parsing in Perl

Parse::RecDescent rationale

⇑ full Perl implementation;⇑ mixed lexical and syntactic analyzer in same code;⇓ slow;⇓ only support LL(1) grammars;

Alberto Simoes Parsing in Perl

Parse::RecDescent

use Parse::RecDescent;

our %VAR;

my $grammar = q{

Program: Statement(s) /\Z/ { 1 }

Statement: Var ’=’ Expression ’;’ { $main::VAR{$item[1]} = $item[3]; }

| /print/ Expression ’;’ { print "> $item[2]\n"; }

Expression: Number ’+’ Expression { $item[1]+$item[3] }

| Number ’-’ Expression { $item[1]-$item[3] }

| Var ’+’ Expression { ($main::VAR{$item[1]} || 0) + $item[3] }

| Var ’-’ Expression { ($main::VAR{$item[1]} || 0) + $item[3] }

| Var { $main::VAR{$item[1]} || 0; }

| Number { $item[1]; }

Number: /+./

Var: /[a-z]+/

};

my $parser = new Parse::RecDescent($grammar);

undef $/;

my $text = <STDIN>;

$parser->Program($text) or die "** Parse Error **\n";

Alberto Simoes Parsing in Perl

Problems

Unfortunately, the program does not respect left association of theoperators. Couldn’t manage to solve that (didn’t try hard).

3− 2 + 1 is evaluated as Number(3)− Expression(2 + 1), thus,evaluating it to 0 instead of the correct answer: 2

Well, I had a cheat version, but it made the test program a lotslower than it is at the moment.

Alberto Simoes Parsing in Perl

Problems

Unfortunately, the program does not respect left association of theoperators. Couldn’t manage to solve that (didn’t try hard).

3− 2 + 1 is evaluated as Number(3)− Expression(2 + 1), thus,evaluating it to 0 instead of the correct answer: 2

Well, I had a cheat version, but it made the test program a lotslower than it is at the moment.

Alberto Simoes Parsing in Perl

Problems

Unfortunately, the program does not respect left association of theoperators. Couldn’t manage to solve that (didn’t try hard).

3− 2 + 1 is evaluated as Number(3)− Expression(2 + 1), thus,evaluating it to 0 instead of the correct answer: 2

Well, I had a cheat version, but it made the test program a lotslower than it is at the moment.

Alberto Simoes Parsing in Perl

Parse::RecDescent timings

test size spent time

10 0.104 s100 0.203 s

1 000 1.520 s10 000 87.310 s

Alberto Simoes Parsing in Perl

Parse::RecDescent Memory Usage

perl recdes.pl 1,778,617,585,999 bytes x ms

ms0.0 20000.040000.060000.080000.0100000.0120000.0140000.0160000.0180000.0200000.0220000.0240000.0

byte

s

0M

2M

4M

6M

heap-admin

x809F54B:Perl_safesysrea

x809F49D:Perl_safesysmal

test file with 10 000 lines

Alberto Simoes Parsing in Perl

Parse::YAPP ID

Author: Francois Desarmenien

Lastest Release: 1.05 (Nov 4, 2001)

Available from: CPAN

Alberto Simoes Parsing in Perl

Parse::YAPP rationale

⇑ full Perl implementation;⇑ supports bison-like LR grammars;⇓ you need to specify your own lexical analyzer;⇓ slow for big input files...

if you do not prepare a good lexical analyzer;

Alberto Simoes Parsing in Perl

Parse::Yapp

%left ’+’ ’-’

%%

Program : Statement

| Program Statement

;

Statement : Var ’=’ Expression ’;’ { $main::VAR$_[1] = $_[3] }

| Print Expression ’;’ { print "> $_[2]\n" }

;

Expression : Expression ’-’ Expression { $_[1] - $_[3] }

| Expression ’+’ Expression { $_[1] + $_[3] }

| Var { $main::VAR{$_[1]} || 0 }

| Number { $_[1] }

;

%%

our %VAR;

my $p = new Calc();

undef $/;

my $File = <STDIN>;

$p->YYParse( yylex => \&yylex,yyerror => \&yyerror);

Alberto Simoes Parsing in Perl

Parse::Yapp

sub yyerror {

if ($_[0]->YYCurtok) {

printf STDERR (’Error: a "%s" (%s) was fond where %s was expected’."\n",$_[0]->YYCurtok, $_[0]->YYCurval, $_[0]->YYExpect)

} else {

print STDERR "Expecting one of ",join(", ",$_[0]->YYExpect),"\n";}

}

sub yylex{

for($File){

1 while (s!^(\s+|\n)!!g); # Advance spaces

return ("","") if $_ eq ""; # EOF

# Tokens

s!^(\d+)!! and return ("Number", $1);

s!^print!! and return ("Print", "print");

s!^([a-z]+)!! and return ("Var", $1);

# Operators

s!([;+-=])!! and return ($1,$1);

print STDERR "Unexpected symbols: ’$File’\n" ;

}

}

Alberto Simoes Parsing in Perl

Parse::YAPP timings

test size Parse::RecDescent Parse::YAPP

10 0.104 s 0.016 s100 0.203 s 0.034 s

1 000 1.520 s 0.272 s10 000 87.310 s 4.972 s

100 000 — 2 253.657 s

Alberto Simoes Parsing in Perl

Parse::Yapp Memory Usage

perl Calc.pl 74,532,562,124 bytes x ms

ms0.0 20000.0 40000.0 60000.0

byte

s

0k

200k

400k

600k

800k

1,000k

1,200k

x809F54B:Perl_safesysrea

heap-admin

x809F49D:Perl_safesysmal

test file with 10 000 lines

Alberto Simoes Parsing in Perl

Parse::YAPP + flex ID

Idea by: Alberto Simoes

Latest Release: n/a

Available from: The Perl Review v0i3, 2002

Alberto Simoes Parsing in Perl

Parse::YAPP+flex rationale

⇑ fast and robust for big input files;⇑ supports bison-like LR grammars;⇓ to glue Perl and C takes some work;⇓ you need a C compiler;⇓ you need to know a little of C and flex;

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: the lexical analyzer

%{

#define YY_DECL char* yylex() void;

%}

char buffer[15];

%%

"print" { return strcpy(buffer, "Print"); }

[0-9]+ { return strcpy(buffer, "Number"); }

[a-z]+ { return strcpy(buffer, "Var"); }

\n { }

" " { }

. { return strcpy(buffer, yytext); }

%%

int perl_yywrap(void) { return 1; }

char *perl_yylextext(void) { return perl_yytext; }

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: the syntactic analyzer

%left ’+’ ’-’

%%

Program : Statement

| Program Statement

;

Statement : Var ’=’ Expression ’;’ { $main::VAR$_[1] = $_[3] }

| Print Expression ’;’ { print "> $_[2]\n"; }

;

Expression : Expression ’-’ Expression { $_[1] - $_[3] }

| Expression ’+’ Expression { $_[1] + $_[3] }

| Var { $main::VAR{$_[1]} || 0 }

| Number { $_[1] }

;

%%

our %VAR;

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: just that?

NO!

you need XS glue code;you need some Perl glue code;you need a decent makefile;

Can you give details?

Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;http://alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: just that?

NO!

you need XS glue code;you need some Perl glue code;you need a decent makefile;

Can you give details?

Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;http://alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: just that?

NO!

you need XS glue code;you need some Perl glue code;you need a decent makefile;

Can you give details?

Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;http://alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: just that?

NO!

you need XS glue code;you need some Perl glue code;you need a decent makefile;

Can you give details?

Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;http://alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: just that?

NO!

you need XS glue code;you need some Perl glue code;you need a decent makefile;

Can you give details?

Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;http://alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: just that?

NO!

you need XS glue code;you need some Perl glue code;you need a decent makefile;

Can you give details?

Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;http://alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: just that?

NO!

you need XS glue code;you need some Perl glue code;you need a decent makefile;

Can you give details?

Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;http://alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf

Alberto Simoes Parsing in Perl

Parse::Yapp + flex: just that?

NO!

you need XS glue code;you need some Perl glue code;you need a decent makefile;

Can you give details?

Check my article “Cooking Perl with Flex” in TPR v0i3, 2002;http://alfarrabio.di.uminho.pt/~albie/publications/perlflex.pdf

Alberto Simoes Parsing in Perl

Parse::YAPP + flex timings

test size RecDescent YAPP YAPP + flex

10 0.104 s 0.016 s 0.034 s100 0.203 s 0.034 s 0.049 s

1 000 1.520 s 0.272 s 0.174 s10 000 87.310 s 4.972 s 1.168 s

100 000 — 2 253.657 s 12.145 s1 000 000 — — 122.377 s2 000 000 — — 264.219 s4 000 000 — — 530.527 s6 000 000 — — 800.705 s

Alberto Simoes Parsing in Perl

Parse::Yapp + flex Memory Usage

perl parse.pl 20,106,601,308 bytes x ms

ms0.0 2000.0 4000.0 6000.0 8000.010000.012000.014000.016000.018000.020000.022000.0

byte

s

0k

200k

400k

600k

x809F54B:Perl_safesysrea

x4032CAF:perl_yyalloc

heap-admin

x809F49D:Perl_safesysmal

test file with 10 000 lines

Alberto Simoes Parsing in Perl

Parrot Grammar Engine ID

Author: mostly, Patrick Michaud

Lastest Release: to be released yet

Available from: Parrot releases or Parrot SVN tree

Alberto Simoes Parsing in Perl

PGE rationale

⇑ built-in in Perl 6;⇑ includes constructs to simplify the LL(1) constrain;m not yet fast... but we are working on it;⇓ Mainly a top-down parser (although bottom-up should also be supported);⇓ ATM you need to write semantic actions in PIR;

Alberto Simoes Parsing in Perl

PGE implementation

grammar Benchmark;

token program { <?statement>+ }

rule statement {

| print <expression> ; {{ $I0 = match[’expression’];

print $I0; print "\n" }}

| <var> = <expression> ; {{ $P0 = match[’expression’];

$S0 = match[’var’]; set_global $S0, $P0 }}

}

rule expression { <value> [ <add> | <sub> ]* {{ $I0 = match[’value’]

# 25 lines removed...

.return($I0) }}

}

rule add { \+ <value> }

rule sub { \- <value> }

rule value { <number> {{ $I0 = match[’number’]; .return ($I0) }}

| <var> {{ $S0 = match[’var’];

$P0 = get_global $S0; $I0 = $P0; .return($I0) }}

}

token number { \d+ }

token var { <[a..z]>+ }

Alberto Simoes Parsing in Perl

PGE timings

test size RecDescent YAPP YAPP + flex PGE10 0.104 s 0.016 s 0.034 s 0.124 s

100 0.203 s 0.034 s 0.049 s 0.253 s1 000 1.520 s 0.272 s 0.174 s 1.463 s

10 000 87.310 s 4.972 s 1.168 s 16.189 s100 000 — 2 253.657 s 12.145 s 665.746 s

1 000 000 — — 122.377 s —2 000 000 — — 264.219 s —4 000 000 — — 530.527 s —6 000 000 — — 800.705 s —

Alberto Simoes Parsing in Perl

PGE Memory Usage

../../../../parrot -j main.pir 92,090,753,626 bytes x ms

ms0.0 2000.0 4000.0 6000.0 8000.0 10000.0 12000.0

byte

s

0M

2M

4M

6M

8M

x417A880:mem__sys_reallo

heap-admin

x417A82F:mem__internal_a

x417A73D:mem_sys_allocat

x417A7DF:mem_sys_allocat

test file with 10 000 lines

Alberto Simoes Parsing in Perl

Remember I had C implementations?

Let’s look into their memory usage.

Alberto Simoes Parsing in Perl

Remember I had C implementations?

Let’s look into their memory usage.

Alberto Simoes Parsing in Perl

Timings for C implementations

test size Parse:: Parse:: YAPP PGE re2c + flex +RecDescent YAPP + flex lemon bison

10 0.104 s 0.016 s 0.034 s 0.124 s 0.001 s 0.001 s100 0.203 s 0.034 s 0.049 s 0.253 s 0.001 s 0.001 s

1 000 1.520 s 0.272 s 0.174 s 1.463 s 0.002 s 0.002 s10 000 87.310 s 4.972 s 1.168 s 16.189 s 0.009 s 0.009 s

100 000 — 2 253.657 s 12.145 s 665.746 s 0.089 s 0.103 s1 000 000 — — 122.377 s — 0.850 s 0.862 s2 000 000 — — 264.219 s — 1.896 s 1.891 s4 000 000 — — 530.527 s — 4.327 s 3.604 s6 000 000 — — 800.705 s — 5.681 s 5.665 s

Alberto Simoes Parsing in Perl

flex+bison Memory Usage

parser 16,427,193 bytes x ms

ms0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0

byte

s

0k

20k

40k

60k

x401914F:posix_memalign

x40625FE:g_malloc0

x80492D9:yyalloc

test file with 10 000 lines

Alberto Simoes Parsing in Perl

re2c+lemon Memory Usage

parser 1,418,530 bytes x ms

ms0.0 50.0 100.0 150.0 200.0 250.0 300.0

byte

s

0k

2k

4k

6k

heap-admin

x8048BD2:ParseAlloc

x401914F:posix_memalign

x40625FE:g_malloc0

test file with 10 000 lines

Alberto Simoes Parsing in Perl

Comparing them all

Alberto Simoes Parsing in Perl

Performance Comparison

0.001

0.01

0.1

1

10

100

1000

10000

10 100 1000 10000 100000 1e+06 1e+07

Tim

e (s

econ

ds)

Test Size (lines)

re2c+lemonbison+flex

Parse::Yapp + flexPGE

Parse::YappParse::RecDescent

Alberto Simoes Parsing in Perl

Thanks!!

Luciano Rocha for the flex + bison and re2c + lemonimplementations;

Ruben Fonseca for the PGE idea;

Patrick Michaud and Kevin Tew for the PGE implementation;

and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, ChipSalzenberg, Allison Randal, Damian Conway, AnnaKournikova, Francois Desarmenien, Jerry Gay, Will Coleda,Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, briand foy, Santa Claus, Audrey Tang, Jose Joao Almeida,Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,Leon Brocard, Josette Garcia, James Tisdall, Jose Castro,Michael Schwern, Pamela Anderson, Andy Lester, Abigail,Nicholas Clark, Magda Joana Silva, Matt Diephouse, IlyaMartynov, Wikipedia, Randal Schwartz, Dan Sugalski, JonOrwant, Tom Christiansen, Johan Vromans, ........................

Alberto Simoes Parsing in Perl

Thanks!!

Luciano Rocha for the flex + bison and re2c + lemonimplementations;

Ruben Fonseca for the PGE idea;

Patrick Michaud and Kevin Tew for the PGE implementation;

and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, ChipSalzenberg, Allison Randal, Damian Conway, AnnaKournikova, Francois Desarmenien, Jerry Gay, Will Coleda,Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, briand foy, Santa Claus, Audrey Tang, Jose Joao Almeida,Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,Leon Brocard, Josette Garcia, James Tisdall, Jose Castro,Michael Schwern, Pamela Anderson, Andy Lester, Abigail,Nicholas Clark, Magda Joana Silva, Matt Diephouse, IlyaMartynov, Wikipedia, Randal Schwartz, Dan Sugalski, JonOrwant, Tom Christiansen, Johan Vromans, ........................

Alberto Simoes Parsing in Perl

Thanks!!

Luciano Rocha for the flex + bison and re2c + lemonimplementations;

Ruben Fonseca for the PGE idea;

Patrick Michaud and Kevin Tew for the PGE implementation;

and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, ChipSalzenberg, Allison Randal, Damian Conway, AnnaKournikova, Francois Desarmenien, Jerry Gay, Will Coleda,Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, briand foy, Santa Claus, Audrey Tang, Jose Joao Almeida,Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,Leon Brocard, Josette Garcia, James Tisdall, Jose Castro,Michael Schwern, Pamela Anderson, Andy Lester, Abigail,Nicholas Clark, Magda Joana Silva, Matt Diephouse, IlyaMartynov, Wikipedia, Randal Schwartz, Dan Sugalski, JonOrwant, Tom Christiansen, Johan Vromans, ........................

Alberto Simoes Parsing in Perl

Thanks!!

Luciano Rocha for the flex + bison and re2c + lemonimplementations;

Ruben Fonseca for the PGE idea;

Patrick Michaud and Kevin Tew for the PGE implementation;

and, of course, Larry Wall, Gloria Wall, Leopold Toetsch, ChipSalzenberg, Allison Randal, Damian Conway, AnnaKournikova, Francois Desarmenien, Jerry Gay, Will Coleda,Simon Cozens, Vern Paxson, Jef Poskanzer, Kevin Gong, briand foy, Santa Claus, Audrey Tang, Jose Joao Almeida,Batman, Jonathan Scott Duff, Nuno Carvalho, Marty Pauley,Leon Brocard, Josette Garcia, James Tisdall, Jose Castro,Michael Schwern, Pamela Anderson, Andy Lester, Abigail,Nicholas Clark, Magda Joana Silva, Matt Diephouse, IlyaMartynov, Wikipedia, Randal Schwartz, Dan Sugalski, JonOrwant, Tom Christiansen, Johan Vromans, ........................

Alberto Simoes Parsing in Perl


Recommended