Mayank Gupta and Rajpal Singh
Wildcard Match
0in FE Noida
March, 2012
2© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
Agenda
Introduction Motivation New Flow Class Hierarchy
Mayank, Wildcard Match, March 2012
3© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
Motivation
Efficiently matching a regular expression in a RTL design.
Use NELT to do matching.— Previous flow creates a separate data structure
altogether to do matching. — Using NELT hierarchy would reduce memory usage.
Enhance Functionality.
Mayank, Wildcard Match, March 2012
4© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
New Flow
Mayank, Wildcard Match, March 2012
Tokenize •Tokenizing Pattern•Store it in appropriate Data structure
Match on NELT •Start matching on NELT.
Match on UTG •Do matching on UTG.•For Record/Arrays.
5© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
New Flow
STEP 1 : Tokenize wildcard
Eg : Wildcard is “a*.b*.*.*c*”
Mayank, Wildcard Match, March 2012
a*
b* *
*c*
6© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
New Flow
STEP 2 : Start matching nodes in NELT
- Match current token with top’s children
Mayank, Wildcard Match, March 2012
top
a1
a aa b2
b c1
C
b1
b c1
c
a*
b* *
*c*
7© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
New Flow
- Match “b*” with children of a1
Mayank, Wildcard Match, March 2012
top
a1
a aa b2
b c1
C
b1
b c1
ca*
b* *
*c*
8© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
New Flow
- Match “*” with children of b2
Mayank, Wildcard Match, March 2012
top
a1
a aa b2
b c1
c
b1
b c1
ca*
b* *
*c*
9© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
New Flow
- Match “*c*” and “*” with children of c1
Mayank, Wildcard Match, March 2012
top
a1
a aa b2
b c1
c
b1
b c1
ca*
b* *
*c*
a*
b* *
*c*
Final Match
10© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
New Flow
Mayank, Wildcard Match, March 2012
Step 3 : Match on UTG hierarchy
— If we hit a record/Array/Subtype we match using UTG Hierarchy.
11© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
Why match using UTG?
Mayank, Wildcard Match, March 2012
Because we do not create NELT for record symbols.
Hence we use UTG for matching inside records.
top
a1
a b2
b
b1
b
Record1
Record2
f21 f22
f1 f2
No NELT for this portion
12© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
Tokenizing a wildcard
Mayank, Wildcard Match, March 2012
TokenBase
StarTokenStringToken
Class Hierarchy
A token can be of two types :— String Token — Star Token
Star token is simply a ‘*’ String token is anything other
than ‘*’ Eg : “a*.b*.*.*c*”
— String Tokens are a*,b*,*c*
— Star token is only 1 here - *
13© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
How to do Hierarchical Match?
Mayank, Wildcard Match, March 2012
Star Token
Hierarchical Match Star
Local Match Star
Two types of ‘*’ in regex
How we ensure that we match hierarchy in case of ‘*’
There are two types of ‘*’— Local Match Star— Hierarchical Match Star
Local Star matches only the nodes at current level
Hierarchical Star matches all the nodes at current and lower level.
14© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
How to do Hierarchical Match?
Mayank, Wildcard Match, March 2012
Star Token
Hierarchical Match Star
Local Match Star
Two types of ‘*’ in regex
Example. If pattern is “a*.b*.*.*c*”
It will be converted to
a*
H*
b*
H*
L*
H*
*c*
H*
15© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
Organizing Tokens
Mayank, Wildcard Match, March 2012
Class NeltTokenArray contains — vector<TokenBase*>
Class NeltTokenIndex contains — NeltTokenArray*— Index (current token)
Class NeltRegexExpr contains— Vector<NeltTokenIndex*>
a*
b* *
*c*
a*
b* *
*c*
Index
a*
b* *
*c*
Index
a*
b* *
*c*
a*
b* *
*c*
IndexIndex
16© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
Manager Classes
Mayank, Wildcard Match, March 2012
Class NeltRegexMgr is used to match on NELT. Class NeltUtgRegexMgr is used to match on UTG . It is the responsibility of NeltRegexMgr to invoke
NeltUtgRegexMgr.
17© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
C++ classes
NeltRegexMgr — NeltTraverse
NeltUtgRegexMgr— NeltTypeTraverse
NeltRegexExpr NeltTokenIndex
— NeltUtgTokenIndex
NeltTokenArray TokenBase
— StringToken— StarToken
Mayank, Wildcard Match, March 2012
NeltTokenIndex
NeltRegexMgr
NeltTraverse
NeltUtgTokenIndex
NeltUtgRegexMgr
NeltUtgTypeTraverse
TokenBase
StarTokenStringToken
18© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
Source Code
Source files— src/commonpp/nelt
– neltRegexMgr.cxx– neltRegexMgr.hxx– neltUtgRegexMgr.cxx– neltUtgRegexMgr.hxx– neltRegexUtils.cxx– neltRegexUtils.hxx
Mayank, Wildcard Match, March 2012
19© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
Performance data
Mayank, Wildcard Match, March 2012
S.No Test Case Old Flow Time(s)
New Flow Time(s)
1 Parme 161 484
2 Oracle 1814 1658
20© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
Future Work
Add a new class SliceToken deriving from TokenBase to store tokens of the form tok[slice]
Avoid duplicate matching— Eg : “a*.a*b” is expanded into two patterns : 1) a*.H*.a*b 2) a*.H*.a*.H*.*b Both the patterns have “a*.H*” in the beginning and
hence it gets matched twice.
Mayank, Wildcard Match, March 2012
21© 2011 Mentor Graphics Corp. Company Confidentialwww.mentor.com
www.mentor.com
Mayank, Wildcard Match, March 2012