Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | norman-robinson |
View: | 216 times |
Download: | 0 times |
Copyright © 2004-2011 Curt Hill
Other Trees
Applications of the Tree Structure
Copyright © 2004-2011 Curt Hill
Parse or expression trees• An expression tree contains:
– Operators as interior nodes– Values as leaves
• The shape of the expression tree captures the precedence
• Consider the following expression:2+3*4
Copyright © 2004-2011 Curt Hill
Expression trees2+3*4
+
*2
3 4
+
* 2
3 4
3*4+2
Copyright © 2004-2011 Curt Hill
Notes• The plus is always higher in the
tree because the precedence of multiplication is higher
• The expression tree is based upon the grammar rules– Syntax diagrams will be considered
later
Copyright © 2004-2011 Curt Hill
Traversal• The names come from the above
expression tree• There are six (3!) ways to traverse the
depending on the order of processing:– The node – The left subtree– The right subtree
• Inorder (left and right)• Preorder• Postorder
Copyright © 2004-2011 Curt Hill
Inorder • According to the sorted order of
tree• Visit lower (left) subtree• Process node• Visit upper (right) subtree• The reverse produces higher to
lower• Left to right 2 + 3 * 4• This gives standard algebraic
notation
Copyright © 2004-2011 Curt Hill
Preorder • Node first then subtrees• Process node• Visit lower (left) subtree• Visit upper (right) subtree• Expression + 2 * 3 4• Remember this?
Copyright © 2004-2011 Curt Hill
Postorder • Subtrees first and then node• Visit lower (left) subtree• Visit upper (right) subtree• Process node• Expression: 2 3 4 * +• Reverse Polish
Copyright © 2004-2011 Curt Hill
Parse Trees• A parse tree is a construction that
represents the parse of a sentence• Parse trees are often built by
compilers and other programs that scan source code
• A parse tree is one acceptable parse of this source code based on the grammar
• First some definitions and background
Copyright © 2004-2011 Curt Hill
A grammar consists of• Terminals
• Keywords• Constants• Words• Symbols or operators
• Non-terminals• Named constructs where the name never actually
appears• Such as statements and expression
• Distinguished symbol• The starting point• Usually represents whole program
• Productions• Rules for going from distinguished symbol to non
terminals
Copyright © 2004-2011 Curt Hill
Example• Simplified grammar for expression
– Terminals:• Constant• Ident• + - ( ) * /
– Productions in EBNF:• Expression ::= Term + term• Expression ::= term - term• Expression ::= term• Term ::= Factor [ { * / } factor ]• Factor ::= Constant• Factor ::= Ident• Factor ::= ( Expression )
Copyright © 2004-2011 Curt Hill
Productions Again• The productions are all
replacement rules• They may also be visually denoted
in terms of syntax graphs• These are usually as easy to
understand but represent same information
Copyright © 2004-2011 Curt Hill
Syntax Graphs• A circle represents a terminal
– Reserved word or operator– No further definition
• A rectangle represents a non-terminal– For statement or expression– Must be defined else where
• An arrow represents the path between one item and another– The arrows may branch indicating
alternatives• Recursion is also allowed
Copyright © 2004-2011 Curt Hill
Simple Expressionsexpression
term+
-termfactor
*
/factor
constant ident ( )expression
Copyright © 2004-2011 Curt Hill
Parse tree example• In a parse tree
– Leaves are terminals– Nodes are non-terminals
• Usually evaluated from bottom to top
• Consider the parse of:2 + 5 * ( 3 - 4 )
Copyright © 2004-2011 Curt Hill
Expression: 2 + 5 * (3 – 4)
term -
factor
3
term
factor
4
expression
*factor
5
termterm +
factor
2
expression
factor
( (
Copyright © 2004-2011 Curt Hill
Notes• Parse trees are partially built by
the parser of the compiler• Then used by the code generator• Once used the sub-trees are
discarded• Whole tree never exists
Copyright © 2004-2011 Curt Hill
Balance and Search Times• The time it takes to search a tree
is based upon the path length to the desired node
• Assuming equal distributions then– The average search is the sum of the
path lengths divided by the number of tree nodes
Copyright © 2004-2011 Curt Hill
Unbalanced Tree12
19 6
2 15 36
4 0 24
30
29
Copyright © 2004-2011 Curt Hill
Average Search Length• 12 – 1• 6 – 2• 19 – 2• 2 – 3• 15 – 3• 36 – 3
• 24 – 4• 0 – 4• 4 – 4• 30 – 5• 29 – 6
• Sum of 37 for 11 nodes gives average search length of 3.3
Copyright © 2004-2011 Curt Hill
Perfectly Balanced Tree36
24 4
2 15 28
6 0 19
30
12
Copyright © 2004-2011 Curt Hill
Average Search Length• 36 – 1• 4 – 2• 24 – 2• 2 – 3• 12 - 3• 15 – 3• 28 – 3
• 0 – 4• 6 – 4• 19 – 4• 30 – 4
• Sum of 33 fpr 11 nodes gives average search length of 3.0
• Balanced does perform better
Copyright © 2004-2011 Curt Hill
AVL Balanced Tree36
19 6
2 15 28
4 0 24
30
12
Copyright © 2004-2011 Curt Hill
Average Search Length• 36 – 1• 6 – 2• 19 – 2• 2 – 3• 12 - 3• 15 – 3• 28 – 3
• 0 – 4• 4 – 4• 28 – 4• 30 – 4
• Sum of 33 fpr 11 nodes gives average search length of 3.0
• AVL balanced has the same performance as perfectly balanced
Copyright © 2004-2011 Curt Hill
Balanced is Best?• The idea of balancing a tree is
predicated on equal frequencies of keys– Reasonable assumption if no contrary
information– However, if we have frequency
information we can do better• C++ keywords are not evenly
distributed
Copyright © 2004-2011 Curt Hill
Path Lengths• The idea of balance is nice in general but…• If we have a reasonable idea of the
frequency of entries we can do better than perfectly balanced
• What we want to do is minimize the average path length
• With our previous knowledge we could make not assumptions concerning frequency
• Now we can generate a more precise formula
Copyright © 2004-2011 Curt Hill
Average Path Length
• Where– n is the number of words– p is the path length of word i– f is the frequency of word i
n
iii fp
1
Copyright © 2004-2011 Curt Hill
Optimal Search Trees• What we want are high frequency
words close to the root and low frequency words at the leaves
• You might think that the most common word should be the root and the next two words the second and third common
• It does not work that way since we need to maintain the order as well
Copyright © 2004-2011 Curt Hill
Example• For example the word "the" is the most
common word in English text• The top n are:
– the (20)– and (15)– of (13)– to (12)– you (7)– in (7)– a (6)
• Because the top two are such extremes it may be better to have “of” as the root
Copyright © 2004-2011 Curt Hill
LISP Lists• LISP is very old
– Second only to FORTRAN• Usually encountered in Programming
Language or Artificial Intelligence classes• It has an ubiquitous data structure called
a list• However it is not a list in the sense that
it is purely linear• Instead it is a tree, but a tree without a
key
Copyright © 2004-2011 Curt Hill
Variables in LISP• A variable may be:
– An atom– A list
• An atom is any word or number• A list may be:
– Empty– A variable followed by a list
Copyright © 2004-2011 Curt Hill
Lists• A list could be a simple list within
parenthesis– (Three element list)
• It could also have sub-lists– (Atom (A sub list) another (list))– This is clearly not a linear list such as
an STL List• LISP programs were also lists
– The programs and data had same form
Copyright © 2004-2011 Curt Hill
Implementation• The LISP language was influenced
by the machine on which it was developed
• It had a 36 bit word that was partitioned into two pointers– Contents Address Register (CAR)– Contents Data Register (CDR)
• An atom used the word for data• A list used the pointers and atoms• A list always ended in nil, a special
pointer
Copyright © 2004-2011 Curt Hill
Example
Three
element
List
nil
(Three element list)
Copyright © 2004-2011 Curt Hill
Second Example
Atom
sub
last
nil
(Atom (sub list) last)
listnil
Copyright © 2004-2011 Curt Hill
List Processing• There were two functions that
were continually used in LISP to process a list
• Car gave the first item of the list– Which could be a list itself
• Cdr gave the rest of the list• A heavy dose of recursion and LISP
could do it all