Taxonomy-Based Software Construction for Algorithmic Families
Bruce W. Watson
with Loek Cleophas & Derrick Kourie
VaMoS 2016 2016.01.28
2
Aim & Motivation
RuZA Workshop, CSIR 2014/08/14
AimGive overview and examples of Taxonomies and of the TABASCO approach—TAxonomy-BAsed Software COnstruction
Motivation Understand and bring order to (algorithmic) domain, and construct reusable (algorithmic) software for it
VaMoS 2016 2016.01.28
3
Random Quotes
RuZA Workshop, CSIR 2014/08/14
Bjarne Stroustrup“infrastructure software” has stronger quality and elegance requirements
C.A.R. (Tony) Hoare “…[your] taxonomies are to the field of algorithmics what the Standard Model is to Particle Physics…”
VaMoS 2016 2016.01.28
4VaMoS 2016 2016.01.28
TABASCO Exercises
• Keyword pattern matching • Finite automata construction • Deterministic finite automata minimization • Minimal acyclic deterministic finite automata construction • Lempel-Ziv-style compression • Tree automata (pattern matching & acceptance) • Graph representations • Approximate & 2D pattern matching
5VaMoS 2016 2016.01.28
Case Study: Generalised Stringology
• Regular Grammar and Regular Expression – Different types, transformations between them
• Problems – Membership/Acceptance – Keyword Pattern Matching (KPM)
• Finite Automaton – Nondeterministic with/without epsilon-transitions, deterministic
• Theoretical Results (1950s) – Equivalence of NFA and DFA (subset construction) – Equivalence of RG, RE, and FA – Solve by constructing and using FA based on RG/RE
6VaMoS 2016 2016.01.28
Case Study: Generalised Stringology (cont.)
• In practice (1960s - now): – Many applications
• Natural language text search • DNA processing • Network intrusion and virus detection
– Many FA constructions, acceptance/KPM algorithms—O(102) • More efficient; for specific situations
– Difficult to find, understand, compare – Separation between theory and practice – Hard to compare and choose implementations
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
11VaMoS 2016 2016.01.28
Motivation Arbology (Tree Formal Languages)
• Regular Tree Grammar (and Regular Tree Expression) – Different types, transformations between them
• Problems – Membership/Tree (Grammar) Acceptance (TGA), Tree Parsing – Tree Pattern Matching (TPM)
• Finite Tree Automaton (TA) – Nondeterministic with/without epsilon-transitions, deterministic – Undirected, root-to-frontier (RF), frontier-to-root (FR)
• Theoretical Results (1960s) – Equivalence of TAs (except DRFTA) (subset construction) – Equivalence of RTG, RTE, and TA (except DRFTA) – Solve by constructing and using TA based on RTG
12VaMoS 2016 2016.01.28
Motivation Tree Formal Languages/Algorithmics
• In practice (ca. 1975 - now): – Quite a few application domains as well
• Code generation • Term rewriting • Model transformation
– Many TA constructions, TGA/TPM algorithms • More efficient; for specific situations
– Difficult to find, understand, compare – Separation between theory and practice – Hard to compare and choose implementations
13VaMoS 2016 2016.01.28
Generic Domain “Attractions”
• Well-established theory • Algorithmic problems—related, with related solutions • Many algorithms • Many applications
14VaMoS 2016 2016.01.28
Generic Domain Deficiencies
• Inaccessibility of theory and algorithms • Difficulty of understanding and comparing algorithms
– Difference in style – Difference in formality
• Separation between theory and practice • Lack of large collection of implementations • Difficulty of choosing between algorithms
15VaMoS 2016 2016.01.28
TABASCO— Domain Deficiencies & Taxonomies
• Inaccessibility of theory and algorithms • Difficulty of understanding and comparing algorithms
– Difference in style – Difference in formality
• Separation between theory and practice • Lack of large collection of implementations • Difficulty of choosing between algorithms
• Classification (in particular Taxonomy) – Show commonality & variation in algorithm & data representation – Show correctness – Easily find and compare algorithms
16VaMoS 2016 2016.01.28
TABASCO— Domain Deficiencies & Toolkits
• Inaccessibility of theory and algorithms • Difficulty of understanding and comparing algorithms
– Difference in style – Difference in formality
• Separation between theory and practice • Lack of large collection of implementations • Difficulty of choosing between algorithms
• Toolkit, GUI, DSL – Give insight into algorithm properties, performance – Understand and compare algorithms in practice – Allow easy choice and use
17VaMoS 2016 2016.01.28
TABASCO—Steps
Process consists of multiple steps: 1. Selection of domain 2. Literature survey 3. Classification construction 4. Toolkit design 5. Toolkit implementation 6. Benchmarking 7. DSL/GUI design 8. DSL/GUI implementation
VaMoS 2016 2016.01.28
. . . Eukaryotes
. . .
Plantae Animalia
. . .
Mammalia
. . .
Proboscidea
Elephantidae
Loxodonta Africana
Primates
. . .
Homo Sapiens
19VaMoS 2016 2016.01.28
Classifications Biological Taxonomies
• Classify organisms • From abstract, general
to concrete, specific • Properties (details) explicit • Allow comparison
t-acceptor
rf
det
fr
det
match-set
rec
tabulate filter
tabulate
s-path
sp-matcher
det
aca-spm drfta-spm
20VaMoS 2016 2016.01.28
Classifications Algorithm Taxonomies
• Similar to biologicaltaxonomies
• Algorithm taxonomiesclassify algorithmsbased on essential details
• Depicted as tree/DAGNodes refer to algorithms,branches to details
• Algorithms solving one algorithmic problem – From abstract, general to concrete, specific – Root represents high-level algorithm
P
+
S
+
E
AC
AC-OPT
AC-FAIL
KMP-FAIL
LS
OKW
INDICES
GS
NLAU OLAU
NFS
OPT
BMCW NLA
CW
CW
BM
BM
OKW
SP
LMIN
SSD
EGC
BMH
BMH
GS
S F FO (SO)EGC
RSA RFA RFO (RSO)
OBM
INDICES
OKW
MO
SL
MI
MO
FWD REV OM
SL
NONE SFC FAST SLFC
LSKP
Aho-Corasick
Commentz-Walter
Boyer-Moore
Knuth-Morris-Pratt
21VaMoS 2016 2016.01.28
Taxonomies Presentation & Correctness—Top-down• Root represents high-level algorithm
– With pre-/postcondition, invariants, ... – Correctness easily shown
• Adding detail – Obtains refinement/variation
(from literature or new) – Branch connecting
algorithm node to child node – Associated correctness arguments—correctness-preserving
• Correctness of root and of details on rootpath implycorrectness of node—correctness-by-construction approach (Dijkstra et al., Eindhoven; Kourie & Watson, 2012)
P
+
S
+
E
AC
AC-OPT
AC-FAIL
KMP-FAIL
LS
OKW
INDICES
GS
NLAU OLAU
NFS
OPT
BMCW NLA
CW
CW
BM
BM
OKW
SP
LMIN
SSD
EGC
BMH
BMH
GS
S F FO (SO)EGC
RSA RFA RFO (RSO)
OBM
INDICES
OKW
MO
SL
MI
MO
FWD REV OM
SL
NONE SFC FAST SLFC
LSKP
Aho-Corasick
Commentz-Walter
Boyer-Moore
Knuth-Morris-Pratt
22VaMoS 2016 2016.01.28
Taxonomies Presentation & Correctness—Top-down• Allow comparison
– Commonalitieslead to common pathfrom root*
• Multiple pathsto same solution possible
• Main goal: improve understandingof algorithms and their relations,i.e. commonalities and variabilities
• Taxonomy forms domain model, classification – So do feature model, formal concept lattice, topic map, ...
23VaMoS 2016 2016.01.28
Dijkstra refinement
!"#$%!&'(!)*%!(+,%*(-*.,#/0!123,%!
4)5%6!7*89*+-%:)#(;4!"<!#$.42!
#$)#'(!:$)#!.#!.(!68=/0!%5).>6!
9*+-%?@)(#)*A;*B0!@)(#)*A;*B!@;*!
:%90!)47!53!,$;4%!4+59%*!
CD8EFG8HII8HJDEA!K;*!L;9!#.#>%M
,;(.#.;4!"+47%*!53!4)5%N/O!<!:)(!
#$.42.4B!;@!PK;+47%*Q!;*!PR$.%@!
1-.%4#.(#Q!;*!PR;8@;+47%*Q
IF statementRefine {P}S{Q} to
{ P }if G
0
! { P ^ G0
} S0
{ Q }[] G
1
! { P ^ G1
} S1
{ Q }fi
{ Q }
if P =) G0
_ G1
For example
{ pre m and n are integers }if m � n ! x := m; y := n[] m n ! x := n; y := mfi
{ post x = mmax n ^ y = mmin n }
Note nondeterminism
!"#$%!&'(!)*%!(+,%*(-*.,#/0!123,%!
4)5%6!7*89*+-%:)#(;4!"<!#$.42!
#$)#'(!:$)#!.#!.(!68=/0!%5).>6!
9*+-%?@)(#)*A;*B0!@)(#)*A;*B!@;*!
:%90!)47!53!,$;4%!4+59%*!
CD8EFG8HII8HJDEA!K;*!L;9!#.#>%M
,;(.#.;4!"+47%*!53!4)5%N/O!<!:)(!
#$.42.4B!;@!PK;+47%*Q!;*!PR$.%@!
1-.%4#.(#Q!;*!PR;8@;+47%*Q
IF statementRefine {P}S{Q} to
{ P }if G
0
! { P ^ G0
} S0
{ Q }[] G
1
! { P ^ G1
} S1
{ Q }fi
{ Q }
if P =) G0
_ G1
For example
{ pre m and n are integers }if m � n ! x := m; y := n[] m n ! x := n; y := mfi
{ post x = mmax n ^ y = mmin n }
Note nondeterminism
24VaMoS 2016 2016.01.28
Dijkstra refinement
!"#$%!&'(!)*%!(+,%*(-*.,#/0!123,%!
4)5%6!7*89*+-%:)#(;4!"<!#$.42!
#$)#'(!:$)#!.#!.(!68=/0!%5).>6!
9*+-%?@)(#)*A;*B0!@)(#)*A;*B!@;*!
:%90!)47!53!,$;4%!4+59%*!
CD8EFG8HII8HJDEA!K;*!L;9!#.#>%M
,;(.#.;4!"+47%*!53!4)5%N/O!<!:)(!
#$.42.4B!;@!PK;+47%*Q!;*!PR$.%@!
1-.%4#.(#Q!;*!PR;8@;+47%*Q
IF statementRefine {P}S{Q} to
{ P }if G
0
! { P ^ G0
} S0
{ Q }[] G
1
! { P ^ G1
} S1
{ Q }fi
{ Q }
if P =) G0
_ G1
For example
{ pre m and n are integers }if m � n ! x := m; y := n[] m n ! x := n; y := mfi
{ post x = mmax n ^ y = mmin n }
Note nondeterminism
!"#$%!&'(!)*%!(+,%*(-*.,#/0!123,%!
4)5%6!7*89*+-%:)#(;4!"<!#$.42!
#$)#'(!:$)#!.#!.(!68=/0!%5).>6!
9*+-%?@)(#)*A;*B0!@)(#)*A;*B!@;*!
:%90!)47!53!,$;4%!4+59%*!
CD8EFG8HII8HJDEA!K;*!L;9!#.#>%M
,;(.#.;4!"+47%*!53!4)5%N/O!<!:)(!
#$.42.4B!;@!PK;+47%*Q!;*!PR$.%@!
1-.%4#.(#Q!;*!PR;8@;+47%*Q
DO loops
For invariant I and variant expression V we get
{ P }{ I }do G ! { I ^ G }
S0
{ I ^ (V decreased) }od
{ I ^ ¬G }{ Q }
Remember to check P =) I and I ^ ¬G =) Q
25VaMoS 2016 2016.01.28
• Detail choice and order dependon personal preference& domain understanding
• Inclusion of different ordersfor single algorithm leads todirected acyclic graph
• Initial version by Watson& Zwaan (1992-1996)
• Revised & extended – Cleophas (2003) – Cleophas, Watson
& Zwaan (2004; 2010)
P
+
S
+
E
AC
AC-OPT
AC-FAIL
KMP-FAIL
LS
OKW
INDICES
GS
NLAU OLAU
NFS
OPT
BMCW NLA
CW
CW
BM
BM
OKW
SP
LMIN
SSD
EGC
BMH
BMH
GS
S F FO (SO)EGC
RSA RFA RFO (RSO)
OBM
INDICES
OKW
MO
SL
MI
MO
FWD REV OM
SL
NONE SFC FAST SLFC
LSKP
Aho-Corasick
Commentz-Walter
Boyer-Moore
Knuth-Morris-Pratt
Taxonomies Example: Keyword Pattern Matching
26VaMoS 2016 2016.01.28
Taxonomies Example: Keyword Pattern Matching
CW
P
+
S
+
E
-
ACAC-OPT
AC-FAIL KMP-FAIL
LS
OKW
INDICES
GS
NLAU OLAU
NFS
OPT
BMCW NLA
CW
BM
BM
OKW
SPPBP
OKW
SHOBPLMIN
SSD
EGC
BMHBMH
GS
S F FO SO
EGC
RSA RFA RFO (RSO)
backward(suffix,factor,
factor oracle -based)
forward (prefix-based)
shiftfunctions
(leading tosublinear
algorithms)
choice of f(P) & dR,f (automatonrecognizingf(P)R)
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
34VaMoS 2016 2016.01.28
Algorithm (and Problem) Details (e.g.)
4.1. INTRODUCTION AND RELATED WORK 45
okw (Problem detail 4.77) The set of keywords contains one keyword.
indices (Algorithm detail 4.82) Represent substrings by indices into the completestrings, converting a string-based algorithm into an indexing-based algo-rithm.
cw (Algorithm detail 4.90) Consider any shift distance that does not lead tothe missing of any matches. Such shift distances are called safe.
nla (Algorithm detail 4.103) The left and right lookahead symbols are not takeninto account when computing a safe shift distance. The computation of ashift distance is done by using two precomputed shift functions applied tothe current longest partial match.
lla (Algorithm detail 4.104) The left lookahead symbol is taken into accountwhen computing a safe shift distance.
cw-opt (Algorithm detail 4.108) Compute a shift distance using a single precom-puted shift function applied to the current longest partial match and theleft lookahead symbol.
bmcw (Algorithm detail 4.116) Compute a shift distance using a single precom-puted shift function which is applied to the current longest partial matchand the left lookahead symbol. The function yields shifts that are no greaterthan the function in detail (cw-opt).
near-opt (Algorithm detail 4.121) Compute a shift distance using a single precom-puted shift function applied to the current longest partial match and the leftlookahead symbol. The function is derived from the one in detail (bmcw),and it yields shifts which are no greater.
norm (Algorithm detail 4.127) Compute a shift distance as in (nla) but addi-tionally use a third shift function applied to the lookahead symbol. Theshift distance obtained is that of the normal Commentz-Walter algorithm.
bm (Algorithm detail 4.135) Compute a shift distance using one shift functionapplied to the lookahead symbol, and another shift function applied to thecurrent longest partial match. The shift distance obtained is that of theBoyer-Moore algorithm.
rla (Algorithm detail 4.137) The right lookahead symbol is taken into accountwhen computing a safe shift distance.
35VaMoS 2016 2016.01.28
Taxonomies Example: Keyword Pattern Matching
CW
P
+
S
+
E
-
ACAC-OPT
AC-FAIL KMP-FAIL
LS
OKW
INDICES
GS
NLAU OLAU
NFS
OPT
BMCW NLA
CW
BM
BM
OKW
SPPBP
OKW
SHOBPLMIN
SSD
EGC
BMHBMH
GS
S F FO SO
EGC
RSA RFA RFO (RSO)
backward(suffix,factor,
factor oracle -based)
forward (prefix-based)
shiftfunctions
(leading tosublinear
algorithms)
choice of f(P) & dR,f (automatonrecognizingf(P)R)
t-acceptor
rf
det
fr
det
match-set
rec
tabulate filter
tabulate
s-path
sp-matcher
det
aca-spm drfta-spm
filter
tfilt sfilt ifilt cfilt
van Dinther, 1987
Brainerd, 1967 & 1969Turner, 1986van Dinther, 1987Weisgerber & Wilhelm, 1989 Hemerik & Katoen, 1989 Ferdinand, Seidl & Wilhelm, 1994Wilhelm & Mauer, 1995
Chase, 1987Hemerik & Katoen, 1989Ferdinand, Seidl & Wilhelm, 1994 Cleophas, 2008
Aho, Ganapathi & Tjang, 1985, 1988van de Meerakker, 1988Weisgerber & Wilhelm, 1989Ferdinand, Seidl & Wilhelm, 1994Wilhelm & Mauer, 1995Cleophas, Hemerik & Zwaan, 2005 & 2006
36VaMoS 2016 2016.01.28
Taxonomies Example: Tree Acceptance
t-acceptor
rf
det
fr
det
match-set
rec
tabulate filter
tabulate
s-path
sp-matcher
det
aca-spm drfta-spm
filter
tfilt sfilt ifilt cfilt
37VaMoS 2016 2016.01.28
Tree Acceptance Taxonomy One Algorithm Path
|[ const G = . . .;t : . . .;
var b : B| let M = . . . be a ta such that L(M) = L(G);b := t � L(M)
]|
|[ const G = . . .;t : . . .;
var b : B| b := t � L(G)]|
(t-acceptor)
()
38VaMoS 2016 2016.01.28
Tree Acceptance Taxonomy One Algorithm Path
(t-acceptor, fr, det) |[ const G = . . .;t : . . .;
var b : B| let M = . . . be a dfrta such that L(M) = L(G);
b := Traverse(�) ⇤ Qra
func Traverse(⇥ n : D) : Q =|[ var q1, . . . , qn : Q| let a = t(n);if n > 0 �
Traverse := Ra(Traverse(n · 1), . . . ,Traverse(n · n))[] n = 0 �
Traverse := Ra()f i
]| ]|
39VaMoS 2016 2016.01.28
Tree Acceptance Taxonomy One Algorithm Path
Construction of automaton separate issue
t-acceptor
rf
det
fr
det
match-set
rec
tabulate filter
tabulate
s-path
sp-matcher
det
aca-spm drfta-spm
filter
tfilt sfilt ifilt cfilt
t-matcher
rf
ra-loops
det
det
fr
det
match-set
rec
tabulate filter
tabulate
s-path
sp-matcher
det
aca-spm drfta-spm
filter
tfilt sfilt ifilt cfilt
40VaMoS 2016 2016.01.28
Taxonomies Tree Acceptance and Tree Pattern Matching
41VaMoS 2016 2016.01.28
Taxonomies Tree Automata Constructions
• About 50 different constructionsin tree acceptance and tree pattern matching taxonomies – differ in e.g. direction, epsilons, determinism, advanced techniques
• Construction presentation – uniform style – defines state set, transition relation, ... – gives example – correctness arguments – related constructions and literature – identified by sequence of labels indicating details, e.g.
(TPM-TA:ALL-SUB:REM-Epsilon:FR:SUBSET)
42VaMoS 2016 2016.01.28
Automata Construction Taxonomy144 CHAPTER 6. FA CONSTRUCTION ALGORITHMS
rem-ε-dual
pd pd
filt
a-s
e-mark
sym
a-s
filt
filt
filtsym
rem-ε
filt
Wfilt Xfilt
6.65BS (6.39)
6.15
6.636.19
6.83
6.85
MYG (6.44)
6.43 6.68
6.35
b-mark
ASU (6.86)
6.27
use-s
subsetuse-s
subset
use-s
subset use-s
subset
use-s
subset
use-s
subset
use-s
subset use-s
subset
Ant. (6.55)
Brz. (6.57)p. 158
Figure 6.1: A taxonomy of finite automata construction algorithms. The larger graphrepresents the main part of the taxonomy, while the smaller graph represents the twoinstantiations of the filt detail that are discussed in this dissertation. The numbersappearing at some of the vertices correspond to the algorithm or construction numbers inthe text of this chapter. In some cases, the algorithm is not presented explicitly, and thepage number is given instead. The use of duality is clearly shown by the symmetry inthe graph. The algorithms in the dashed-line subtree (on the right of the graph) are nottreated in this dissertation, since they are the duals of algorithms in the left half and it isnot clear that the duals would be more efficient or enlightening.
43VaMoS 2016 2016.01.28
DFA Minimization7.1. INTRODUCTION 193
ASU (7.21)
Hopcroft-Ullman (7.24)
(7.28)
imperative program
(7.18)
(7.19) (7.22)
eq. classes eq. classes
lists
optimized list update
Hopcroft (7.26)
Brzozowski (§ 7.2)
(§ 7.4.6)
pointwise
memoization
approx. from below
Improved
Equivalence of states (§ 7.3)
equivalence relation
approx. from above
Naive
(§ 7.4.1–7.4.5, 7.4.7)
(§ 7.4.1–7.4.5)
layerwise unordered state pairs
(7.27)
(7.23)
(p. 207)
(p. 212)
Figure 7.1: The family trees of finite automata minimization algorithms. Brzozowski’sminimization algorithm is unrelated to the others, and appears as a separate (single vertex)tree. Each algorithm presented in this chapter appears as a vertex in this tree. For eachalgorithm that appears explicitly in this chapter, the construction number appears inparentheses (indicating where it appears in this chapter). For algorithms that do notappear explicitly, a reference to the section or page number is given. Edges denote arefinement of the solution (and therefore explicit relationships between algorithms). Theyare labeled with the name of the refinement.
44VaMoS 2016 2016.01.28
Taxonomies Advantages and Disadvantages
+ Algorithm comparison easier + Clear and correct algorithm presentation + Orders field, usable as teaching aid + Well suited for exploratory algorithmics + Formal specifications + Aids in construction of toolkit - Takes much time and effort (abstraction (bottom-up!), sequential addition of details) - Overkill for some domains?
45VaMoS 2016 2016.01.28
TABASCO—Steps
Process consists of multiple steps: 1. Selection of domain 2. Literature survey 3. Classification construction 4. Toolkit design 5. Toolkit implementation 6. Benchmarking 7. DSL/GUI design 8. DSL/GUI implementation
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
VaMoS 2016 2016.01.28
func Traverse(⇥ n : D) : Q =|[ var q1, . . . , qn : Q| let a = t(n);if n > 0 �
Traverse := Ra(Traverse(n · 1), . . . ,Traverse(n · n))[] n = 0 �
Traverse := Ra()f i
]|
private static AbstractAutomatonState Traverse(AbstractDFRTA M, Node n) { AbstractTAState[] childStates = new AbstractTAState[n.children().size()]; for (int i=0; i < n.children().size(); i++) { childStates[i] = Traverse(M, n.children().get(i)); } if (n.children().size() > 0) { state = M.nextState(childStates, (RankedSymbol)n.symbol()); } else { state = M.nextState(childStates, (RankedSymbol)n.symbol()); } return state; }
51VaMoS 2016 2016.01.28
Toolkit vs Taxonomy
52VaMoS 2016 2016.01.28
Algorithm Performance13.4. RESULTS 299
0
5
10
15
20
25
30
0 2 4 6 8 10 12 14
MB/s
Shortest keyword length
CW-WBMCW-NORM
AC-OPTAC-FAIL
Figure 13.8: Algorithm performance (in megabytes/second) versus the length of the short-est keyword in a given set. The performance of the CW-WBM and CW-NORM algorithmsare almost coincidental (shown as the ascending solid line).
The performance of the CW algorithms, which declined with increasing keyword setsize, was consistently better than the AC-OPT algorithm. In some cases, the CW-NORMalgorithm displayed a five to ten-fold improvement over the AC-OPT algorithm.
13.4.2 Performance versus minimum keyword length
For each algorithm, the average number of megabytes processed per second was graphedagainst the length of the shortest keyword in a set. For the multiple-keyword tests thegraphs are superimposed in Figure 13.8.
Predictably, the AC-OPT algorithm has performance that is independent of the key-word set. The AC-FAIL algorithm has slightly lower performance, improving with longerminimum keywords. The average performance of the CW algorithms improves almostlinearly with increasing minimum keyword lengths. The low performance of the CW al-gorithms for short minimum keyword lengths is explained by the fact that the CW-WBMand CW-NORM shift functions are bounded above by the length of the minimum keyword(see Chapter 4). For sets with minimum keywords no less than than four characters, theCW algorithms outperform the AC algorithms.
As predicted, the CW-NORM algorithm outperforms the CW-WBM algorithm. Theperformance ratio of the CW-WBM algorithm to the CW-NORM algorithm is shown inFigure 13.9. The figure indicates that the performance gap is wide with small minimumkeyword lengths, and diminishes with increasing minimum keyword lengths. (This effect
53VaMoS 2016 2016.01.28
Ongoing and Future Work
• Existing taxonomies and toolkits developed over 20 years – update and integrate
• e.g. >50 new keyword pattern matching algorithms in 2001-2010 – need to be selective...
– multiple DSLs and GUIs on top • bioinformatics, computational linguistics, network intrusion detection • student view?
• Application to other algorithmic or data structure fields
54VaMoS 2016 2016.01.28
Concluding Remarks
Overview of taxonomy construction and TAxonomy-BAsed Software COnstruction
– algorithmic domains – bringing order and improving understanding
• bonus: exploratory algorithmics
– also aimed at development of large-scale toolkit • allows comparison in practice
– benchmarking data – algorithm selection
• with DSLs/GUIs to simplify usage
– TABASCO is the only such method that takes correctness-by-construction into account
55VaMoS 2016 2016.01.28
References
• L. Cleophas, B.W. Watson, D.G. Kourie, A. Boake & S. Obiedkov,TABASCO: Using Concept-Based Taxonomies in Domain Engineering.SACJ, 37:30–40, December 2006.
• L. Cleophas & B.W. Watson, Taxonomy-based softwareconstruction of SPARE Time: a case study.In IEE Proceedings – Software, 152(1), February 2005.
• L. Cleophas & B.W. Watson, Applying and spicing upTABASCO: taxonomy-based software and how toincrease its usability. In Formal Aspects of Computing—Essays dedicated to Derrick Kourie on the occasionof his 65th Birthday, 173–183, Shaker Verlag, 2013.
• D.G. Kourie & B.W. Watson, The Correctness-by-Construction Approach to Programming, Springer, 2012.
• B.W. Watson, D.G. Kourie & L. Cleophas, Experience with Correctness-by-Construction. To appear in Science of Computer Programming, special issue on New Ideas and Emerging Results in Understanding Software, 2013.