Copyright 2004-2011 Curt Hill Other Trees Applications of the Tree Structure.

Copyright © 2004-2011 Curt Hill

Other Trees

Applications of the Tree Structure


Parse or expression trees• An expression tree contains:

– Operators as interior nodes– Values as leaves

• The shape of the expression tree captures the precedence

• Consider the following expression:2+3*4


Expression trees2+3*4

+

*2

3 4

+

* 2

3 4

3*4+2


Notes• The plus is always higher in the

tree because the precedence of multiplication is higher

• The expression tree is based upon the grammar rules– Syntax diagrams will be considered

later


Traversal• The names come from the above

expression tree• There are six (3!) ways to traverse the

depending on the order of processing:– The node – The left subtree– The right subtree

• Inorder (left and right)• Preorder• Postorder


Inorder • According to the sorted order of

tree• Visit lower (left) subtree• Process node• Visit upper (right) subtree• The reverse produces higher to

lower• Left to right 2 + 3 * 4• This gives standard algebraic

notation


Preorder • Node first then subtrees• Process node• Visit lower (left) subtree• Visit upper (right) subtree• Expression + 2 * 3 4• Remember this?


Postorder • Subtrees first and then node• Visit lower (left) subtree• Visit upper (right) subtree• Process node• Expression: 2 3 4 * +• Reverse Polish


Parse Trees• A parse tree is a construction that

represents the parse of a sentence• Parse trees are often built by

compilers and other programs that scan source code

• A parse tree is one acceptable parse of this source code based on the grammar

• First some definitions and background


A grammar consists of• Terminals

• Keywords• Constants• Words• Symbols or operators

• Non-terminals• Named constructs where the name never actually

appears• Such as statements and expression

• Distinguished symbol• The starting point• Usually represents whole program

• Productions• Rules for going from distinguished symbol to non

terminals


Example• Simplified grammar for expression

– Terminals:• Constant• Ident• + - ( ) * /

– Productions in EBNF:• Expression ::= Term + term• Expression ::= term - term• Expression ::= term• Term ::= Factor [ { * / } factor ]• Factor ::= Constant• Factor ::= Ident• Factor ::= ( Expression )


Productions Again• The productions are all

replacement rules• They may also be visually denoted

in terms of syntax graphs• These are usually as easy to

understand but represent same information


Syntax Graphs• A circle represents a terminal

– Reserved word or operator– No further definition

• A rectangle represents a non-terminal– For statement or expression– Must be defined else where

• An arrow represents the path between one item and another– The arrows may branch indicating

alternatives• Recursion is also allowed


Simple Expressionsexpression

term+

-termfactor

*

/factor

constant ident ( )expression


Parse tree example• In a parse tree

– Leaves are terminals– Nodes are non-terminals

• Usually evaluated from bottom to top

• Consider the parse of:2 + 5 * ( 3 - 4 )


Expression: 2 + 5 * (3 – 4)

term -

factor

3

term

factor

4

expression

*factor

5

termterm +

factor

2

expression

factor

( (


Notes• Parse trees are partially built by

the parser of the compiler• Then used by the code generator• Once used the sub-trees are

discarded• Whole tree never exists


Balance and Search Times• The time it takes to search a tree

is based upon the path length to the desired node

• Assuming equal distributions then– The average search is the sum of the

path lengths divided by the number of tree nodes


Unbalanced Tree12

19 6

2 15 36

4 0 24

30

29


Average Search Length• 12 – 1• 6 – 2• 19 – 2• 2 – 3• 15 – 3• 36 – 3

• 24 – 4• 0 – 4• 4 – 4• 30 – 5• 29 – 6

• Sum of 37 for 11 nodes gives average search length of 3.3


Perfectly Balanced Tree36

24 4

2 15 28

6 0 19

30

12


Average Search Length• 36 – 1• 4 – 2• 24 – 2• 2 – 3• 12 - 3• 15 – 3• 28 – 3

• 0 – 4• 6 – 4• 19 – 4• 30 – 4

• Sum of 33 fpr 11 nodes gives average search length of 3.0

• Balanced does perform better


AVL Balanced Tree36

19 6

2 15 28

4 0 24

30

12


Average Search Length• 36 – 1• 6 – 2• 19 – 2• 2 – 3• 12 - 3• 15 – 3• 28 – 3

• 0 – 4• 4 – 4• 28 – 4• 30 – 4

• Sum of 33 fpr 11 nodes gives average search length of 3.0

• AVL balanced has the same performance as perfectly balanced


Balanced is Best?• The idea of balancing a tree is

predicated on equal frequencies of keys– Reasonable assumption if no contrary

information– However, if we have frequency

information we can do better• C++ keywords are not evenly

distributed


Path Lengths• The idea of balance is nice in general but…• If we have a reasonable idea of the

frequency of entries we can do better than perfectly balanced

• What we want to do is minimize the average path length

• With our previous knowledge we could make not assumptions concerning frequency

• Now we can generate a more precise formula


Average Path Length

• Where– n is the number of words– p is the path length of word i– f is the frequency of word i

n

iii fp

1


Optimal Search Trees• What we want are high frequency

words close to the root and low frequency words at the leaves

• You might think that the most common word should be the root and the next two words the second and third common

• It does not work that way since we need to maintain the order as well


Example• For example the word "the" is the most

common word in English text• The top n are:

– the (20)– and (15)– of (13)– to (12)– you (7)– in (7)– a (6)

• Because the top two are such extremes it may be better to have “of” as the root


LISP Lists• LISP is very old

– Second only to FORTRAN• Usually encountered in Programming

Language or Artificial Intelligence classes• It has an ubiquitous data structure called

a list• However it is not a list in the sense that

it is purely linear• Instead it is a tree, but a tree without a

key


Variables in LISP• A variable may be:

– An atom– A list

• An atom is any word or number• A list may be:

– Empty– A variable followed by a list


Lists• A list could be a simple list within

parenthesis– (Three element list)

• It could also have sub-lists– (Atom (A sub list) another (list))– This is clearly not a linear list such as

an STL List• LISP programs were also lists

– The programs and data had same form


Implementation• The LISP language was influenced

by the machine on which it was developed

• It had a 36 bit word that was partitioned into two pointers– Contents Address Register (CAR)– Contents Data Register (CDR)

• An atom used the word for data• A list used the pointers and atoms• A list always ended in nil, a special

pointer


Example

Three

element

List

nil

(Three element list)


Second Example

Atom

sub

last

nil

(Atom (sub list) last)

listnil


List Processing• There were two functions that

were continually used in LISP to process a list

• Car gave the first item of the list– Which could be a list itself

• Cdr gave the rest of the list• A heavy dose of recursion and LISP

could do it all

Date post:	18-Jan-2018
Category:	Documents
Upload:	norman-robinson
View:	216 times
Download:	0 times

Copyright 2004-2011 Curt Hill Other Trees Applications of the Tree Structure.

Documents