7/28/2019 11 Graph Pattern Mining
1/71
1
Data Mining:
Concepts and Techniques Chapter 9
Graph mining: Part IGraph Pattern Mining
Jiawei Han and Micheline KamberDepartment of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
2006 Jiawei Han and Micheline Kamber. All rights reserved.
http://www.cs.uiuc.edu/~hanjhttp://www.cs.uiuc.edu/~hanj7/28/2019 11 Graph Pattern Mining
2/71
2
Graph Mining
Graph Pattern Mining
Mining Frequent Subgraph Patterns
Impact on Graph Search I: Graph Indexing
Impact on Graph Search II: Graph SimilaritySearch
Constrained Graph Pattern Mining
Graph Classification
Graph Clustering
Summary
7/28/2019 11 Graph Pattern Mining
3/71
3
Why Graph Mining?
Graphs are ubiquitous
Chemical compounds (Cheminformatics)
Protein structures, biological pathways/networks (Bioinformactics)
Program control flow, traffic flow, and workflow analysis
XML databases, Web, and social network analysis Graph is a general model
Trees, lattices, sequences, and items are degenerated graphs
Diversity of graphs
Directed vs. undirected, labeled vs. unlabeled (edges & vertices),
weighted, with angles & geometry (topological vs. 2-D/3-D)
Complexity of algorithms: many problems are of high
complexity
7/28/2019 11 Graph Pattern Mining
4/71
4
Graph, Graph, Everywhere
Aspirin Yeast protein interaction network
fromH.
JeongetalNature411,
41(2
001)
Internet Co-author network
7/28/2019 11 Graph Pattern Mining
5/71
5
Graph Pattern Mining
Frequentsubgraphs
A (sub)graph is f requent if its support(occurrence
frequency) in a given dataset is no less than a
minimum supportthreshold
Applications of graph pattern mining
Mining biochemical structures
Program control flow analysis
Mining XML structures or Web communities
Building blocks for graph classification, clustering,
compression, comparison, and correlation analysis
7/28/2019 11 Graph Pattern Mining
6/71
6
Example: Frequent Subgraphs
GRAPH DATASET
FREQUENT PATTERNS
(MIN SUPPORT IS 2)
(A) (B) (C)
(1) (2)
7/28/2019 11 Graph Pattern Mining
7/71
7/28/2019 11 Graph Pattern Mining
8/718
Graph Mining Algorithms
Incomplete beam search Greedy (Subdue)
Inductive logic programming (WARMR)
Graph theory-based approaches
Apriori-based approach
Pattern-growth approach
7/28/2019 11 Graph Pattern Mining
9/719
SUBDUE (Holder et al. KDD94)
Start with single vertices
Expand best substructures with a new edge
Limit the number of best substructures
Substructures are evaluated based on theirability to compress input graphs
Using minimum description length (DL)
Best substructure S in graph G minimizes:
DL(S) + DL(G\S)
Terminate until no new substructure is discovered
7/28/2019 11 Graph Pattern Mining
10/7110
WARMR (Dehaspe et al. KDD98)
Graphs are represented by Datalog facts
atomel(C, A1, c), bond (C, A1, A2, BT),
atomel(C, A2, c) : a carbon atom bound to a
carbon atom with bond type BT
WARMR: the first general purpose ILP system
Level-wise search
Simulate Apriori for frequent pattern discovery
7/28/2019 11 Graph Pattern Mining
11/7111
Frequent Subgraph Mining Approaches
Apriori-based approach
AGM/AcGM: Inokuchi, et al. (PKDD00)
FSG: Kuramochi and Karypis (ICDM01)
PATH#: Vanetik and Gudes (ICDM02,
ICDM04) FFSM: Huan, et al. (ICDM03)
Pattern growth approach
MoFa, Borgelt and Berthold (ICDM02) gSpan: Yan and Han (ICDM02)
Gaston: Nijssen and Kok (KDD04)
7/28/2019 11 Graph Pattern Mining
12/7112
Properties of Graph Mining Algorithms
Search order
breadth vs. depth
Generation of candidate subgraphs
apriori vs. pattern growth
Elimination of duplicate subgraphs
passive vs. active
Support calculation
embedding store or not
Discover order of patterns
path tree graph
7/28/2019 11 Graph Pattern Mining
13/7113
Apriori-Based Approach
G
G1
G2
Gn
k-edge (k+1)-edge
G
G
JOIN
7/28/2019 11 Graph Pattern Mining
14/71
14
Apriori-Based, Breadth-First Search
AGM (Inokuchi, et al. PKDD00) generates new graphs with one more node
Methodology: breadth-search, joining two graphs
FSG (Kuramochi and Karypis ICDM01)
generates new graphs with one more edge
7/28/2019 11 Graph Pattern Mining
15/71
15
PATH (Vanetik and Gudes ICDM02, 04)
Apriori-based approach
Building blocks: edge-disjoint path
A graph with 3 edge-disjoint
paths
construct frequent paths construct frequent graphs with
2 edge-disjoint paths construct graphs with k+1
edge-disjoint paths fromgraphs with k edge-disjoint
paths repeat
7/28/2019 11 Graph Pattern Mining
16/71
16
FFSM (Huan, et al. ICDM03)
Represent graphs using canonical adjacency matrix(CAM)
Join two CAMs or extend a CAM to generate a newgraph
Store the embeddings of CAMs
All of the embeddings of a pattern in the database
Can derive the embeddings of newly generatedCAMs
7/28/2019 11 Graph Pattern Mining
17/71
17
Pattern Growth Method
G
G1
G2
Gn
k-edge
(k+1)-edge
(k+2)-edge
duplicategraph
7/28/2019 11 Graph Pattern Mining
18/71
18
MoFa (Borgelt and Berthold ICDM02)
Extend graphs by adding a new edge
Store embeddings of discovered frequent graphs
Fast support calculation
Also used in other later developed algorithms
such as FFSM and GASTON
Expensive Memory usage
Local structural pruning
7/28/2019 11 Graph Pattern Mining
19/71
19
GSPAN (Yan and Han ICDM02)
Right-Most Extension
Theorem: Completeness
The Enumeration of Graphsusing Right-most Extension isCOMPLETE
7/28/2019 11 Graph Pattern Mining
20/71
20
DFS Code
Flatten a graph into a sequence using depth first
search
0
1
2
3 4
e0: (0,1)
e1: (1,2)
e2: (2,0)
e3: (2,3)
e4: (3,1)
e5: (2,4)
7/28/2019 11 Graph Pattern Mining
21/71
7/28/2019 11 Graph Pattern Mining
22/71
22
DFS Code Extension
Let a be the minimum DFS code of a graph G and b be
a non-minimum DFS code ofG. For any DFS code dgenerated from b by one right-most extension,
(i) d is not a minimum DFS code,
(ii) min_dfs(d) cannot be extended from b, and
(iii) min_dfs(d) is either less than a or can beextended from a.
THEOREM [ RIGHT-EXTENSION ]
The DFS code of a graph extended from aNon-minimum DFS code is NOT MINIMUM
7/28/2019 11 Graph Pattern Mining
23/71
23
GASTON (Nijssen and Kok KDD04)
Extend graphs directly Store embeddings
Separate the discovery of different types of
graphs
path tree graph
Simple structures are easier to mine and
duplication detection is much simpler
7/28/2019 11 Graph Pattern Mining
24/71
24
Graph Pattern Explosion Problem
If a graph is frequent, all of its subgraphs are
frequent the Apriori property
An n-edge frequent graph may have 2n
subgraphs
Among 422 chemical compounds which are
confirmed to be active in an AIDS antiviral
screen dataset, there are 1,000,000 frequent
graph patterns if the minimum support is 5%
7/28/2019 11 Graph Pattern Mining
25/71
25
Closed Frequent Graphs
Motivation: Handling graph pattern explosion
problem
Closed frequent graph
A frequent graph G is closedif there exists no
supergraph of G that carries the same supportas G
If some of Gs subgraphs have the same
support, it is unnecessary to output these
subgraphs (nonclosed graphs)
Lossless compression: still ensures that the
mining result is complete
7/28/2019 11 Graph Pattern Mining
26/71
26
CLOSEGRAPH (Yan & Han, KDD03)
A Pattern-Growth Approach
G
G1
G2
Gn
k-edge
(k+1)-edge
At what condition, can westop searching their children
i.e., early termination?
If G and G are frequent, G is asubgraph of G. Ifin any part
of the graph in the datasetwhere G occurs, G also
occurs, then we need not growG, since none of Gs children willbe closed except those of G.
7/28/2019 11 Graph Pattern Mining
27/71
27
Handling Tricky Exception Cases
(graph 1)
a
c
b
d
(pattern 2)
(pattern 1)
(graph 2)
a
c
b
d
a b
a
c d
7/28/2019 11 Graph Pattern Mining
28/71
28
Experimental Result
The AIDS antiviral screen compound dataset
from NCI/NIH
The dataset contains 43,905 chemical
compounds Among these 43,905 compounds, 423 of them
belongs to CA, 1081 are of CM, and the
remaining are in class CI
7/28/2019 11 Graph Pattern Mining
29/71
29
Discovered Patterns
20% 10%
5%
7/28/2019 11 Graph Pattern Mining
30/71
30
Performance (1): Run Time
Minimum support (in %)
Run
timeper
pattern
(msec
)
7/28/2019 11 Graph Pattern Mining
31/71
31
Performance (2): Memory Usage
Minimum support (in %)
Me
moryusa
ge(GB)
7/28/2019 11 Graph Pattern Mining
32/71
32
Number of Patterns: Frequent vs. Closed
CA
1.0E+02
1.0E+03
1.0E+04
1.0E+05
1.0E+06
0.05 0.06 0.07 0.08 0.1
frequent graphs
closed frequent graphs
Minimum support
Num
berofpatterns
7/28/2019 11 Graph Pattern Mining
33/71
33
Runtime: Frequent vs. Closed
CA
1
10
100
1000
10000
0.05 0.06 0.07 0.08 0.1
FSG
Gspan
CloseGraph
Minimum support
R
untime
(sec)
7/28/2019 11 Graph Pattern Mining
34/71
34
Do the Odds Beat the Curse of Complexity?
Potentially exponential number of frequent patterns
The worst case complexty vs. the expected probability
Ex.: Suppose Walmart has 104 kinds of products
The chance to pick up one product 10-4
The chance to pick up a particular set of 10 products: 10-40
What is the chance this particular set of 10 products to be
frequent 103 times in 109 transactions?
Have we solved the NP-hard problem of subgraph isomorphism
testing?
No. But the real graphs in bio/chemistry is not so bad
A carbon has only 4 bounds and most proteins in a network
have distinct labels
7/28/2019 11 Graph Pattern Mining
35/71
35
Graph Mining
Graph Pattern Mining
Mining Frequent Subgraph Patterns
Impact on Graph Search I: Graph Indexing
Impact on Graph Search II: Graph SimilaritySearch
Constrained Graph Pattern Mining
Graph Classification
Graph Clustering
Summary
7/28/2019 11 Graph Pattern Mining
36/71
36
Graph Search
Querying graph databases: Given a graph database and a query graph,
find all the graphs containing this query graph
query graph graph database
7/28/2019 11 Graph Pattern Mining
37/71
37
Scalability Issue
Sequential scan Disk I/Os
Subgraph isomorphism testing
An indexing mechanism is needed
DayLight: Daylight.com (commercial)
GraphGrep: Dennis Shasha, et al. PODS'02
Grace: Srinath Srinivasa, et al. ICDE'03
7/28/2019 11 Graph Pattern Mining
38/71
38
Indexing Strategy
Graph (G)
Substructure
Query graph (Q)
If graph G contains query
graph Q, G should contain
any substructure of Q
Remarks
Index substructures of a query graph to prunegraphs that do not contain these substructures
7/28/2019 11 Graph Pattern Mining
39/71
39
Indexing Framework
Two steps in processing graph queries
Step 1. Index Construction
Enumerate structures in the graph database,
build an inverted index between structures
and graphs
Step 2. Query Processing
Enumerate structures in the query graph
Calculate the candidate graphs containing
these structures
Prune the false positive answers by
performing subgraph isomorphism test
7/28/2019 11 Graph Pattern Mining
40/71
40
Cost Analysis
QUERY RESPONSE TIME
testingmisomorphisioqindex TTCT _
REMARK: make |Cq| as small as possible
fetch index number of candidates
7/28/2019 11 Graph Pattern Mining
41/71
41
Path-based Approach
GRAPH DATABASE
PATHS
0-length: C, O, N, S
1-length: C-C, C-O, C-N, C-S, N-N, S-O
2-length: C-C-C, C-O-C, C-N-C, ...
3-length: ...
(a) (b) (c)
Built an inverted index between paths and graphs
7/28/2019 11 Graph Pattern Mining
42/71
7/28/2019 11 Graph Pattern Mining
43/71
43
Problems: Path-based Approach
GRAPH DATABASE
(a) (b) (c)QUERY GRAPH
Only graph (c) contains this query
graph. However, if we only indexpaths: C, C-C, C-C-C, C-C-C-C, we
cannot prune graph (a) and (b).
G
7/28/2019 11 Graph Pattern Mining
44/71
44
gIndex: Indexing Graphs by Data Mining
Our methodology on graph index:
Identify frequent structures in the database, the
frequent structures are subgraphs that appear
quite often in the graph database
Prune redundant frequent structures to
maintain a small set ofdiscriminative structures
Create an inverted index betweendiscriminative frequent structures and graphs in
the database
7/28/2019 11 Graph Pattern Mining
45/71
45
IDEAS: Indexing with Two Constraints
structure (>106)
frequent (~105)
discriminative (~103)
Wh Di i i ti S b h ?
7/28/2019 11 Graph Pattern Mining
46/71
46
Why Discriminative Subgraphs?
All graphs contain structures: C, C-C, C-C-C
Why bother indexing these redundant frequent
structures? Only index structures that provide more
information than existing structures
Sample database
(a) (b) (c)
Di i i ti St t
7/28/2019 11 Graph Pattern Mining
47/71
47
Discriminative Structures
Pinpoint the most useful frequent structures
Given a set of structures and a new
structure , we measure the extra indexing
power provided by ,
When is small enough, is a discriminative
structure and should be included in the index
Index discriminative frequent structures only
Reduce the index size by an order of
magnitude
.,,, 21 xffffxP in
xnfff ,, 21
x
xP
Wh F t St t ?
7/28/2019 11 Graph Pattern Mining
48/71
48
Why Frequent Structures?
We cannot index (or even search) all ofsubstructures
Large structures will likely be indexed well by theirsubstructures
Size-increasing support threshold
size
su
pport
minimumsupport threshold
E i t l S tti
7/28/2019 11 Graph Pattern Mining
49/71
49
Experimental Setting
The AIDS antiviral screen compound dataset from
NCI/NIH, containing 43,905 chemical compounds
Query graphs are randomly extracted from the
dataset
GraphGrep: maximum length (edges) of paths is
set at 10
gIndex: maximum size (edges) of structures is set
at 10
7/28/2019 11 Graph Pattern Mining
50/71
7/28/2019 11 Graph Pattern Mining
51/71
51
Experiments: Answer Set Size
0
20
4060
80
100
120140
4 8 12 16 20 24
GraphGrep
gIndex
Actual Match
QUERY SIZE
#OFCA
NDIDATES
7/28/2019 11 Graph Pattern Mining
52/71
Experiments: Incremental Maintenance
20
30
40
50
60
70
80
2K 4K 6k 8k 10k
From scratch Incremental
Frequent structures are stable to database updating
Index can be built based on a small portion of a graph
database, but be used for the whole database
Alt ti G h I d i M th d
7/28/2019 11 Graph Pattern Mining
53/71
Alternative Graph Indexing Methods
Graph-structure-based indexing and similarity search
Structure-based index methods, e.g., g-Index, S-path index Use index to search for similar graph/network structures
Substructure indexing
Key problem: What substructures as indexing features?
gIndex [Yan, Yu & Han, SIGMOD04]: Findfrequent anddiscriminative subgraphs (by graph-pattern mining)
S-path [Zhao & Han, VLDB10]: Use decomposed shortestpaths as basic indexing features
53
Why S Path as Indexing Features?
7/28/2019 11 Graph Pattern Mining
54/71
Why S-Path as Indexing Features?
Neighborhood signatures of vertices are built to maintain
indexing features: Effective search space pruning ability Processing (Query Decomposition): Decompose the query
graph into a set of indexed shortest paths in S-Path
Network
A global lookup table Neighborhood signature of v3
Query
G h Mi i
7/28/2019 11 Graph Pattern Mining
55/71
55
Graph Mining
Graph Pattern Mining
Mining Frequent Subgraph Patterns
Impact on Graph Search I: Graph Indexing
Impact on Graph Search II: Graph SimilaritySearch
Constrained Graph Pattern Mining
Graph Classification Graph Clustering
Summary
St t Si il it S h
7/28/2019 11 Graph Pattern Mining
56/71
56
Structure Similarity Search
(a) caffeine (b) diurobromine (c) viagra
CHEMICAL COMPOUNDS
QUERY GRAPH
S St i htf d M th d
7/28/2019 11 Graph Pattern Mining
57/71
57
Some Straightforward Methods
Method1: Directly compute the similarity between the
graphs in the DB and the query graph
Sequential scan
Subgraph similarity computation
Method 2: Form a set of subgraph queries from the
original query graph and use the exact subgraph
search
Costly: If we allow 3 edges to be missed in a 20-edge
query graph, it may generate 1,140 subgraphs
I d P i A i t S h
7/28/2019 11 Graph Pattern Mining
58/71
58
Index: Precise vs. Approximate Search
Precise Search
Use frequent patterns as indexing features
Select features in the database space based on their
selectivity
Build the index Approximate Search
Hard to build indices covering similar subgraphs
explosive number of subgraphs in databases
Idea: (1) keep the index structure
(2) select features in the query space
S bstr ct re Similarit Meas re
7/28/2019 11 Graph Pattern Mining
59/71
59
Substructure Similarity Measure
Query relaxation measure
The number of edges that can be relabeled or missed;but the position of these edges are not fixed
QUERY GRAPH
Substructure Similarity Measure
7/28/2019 11 Graph Pattern Mining
60/71
60
Substructure Similarity Measure
Feature-based similarity measure
Each graph is represented as a feature vector
X = {x1, x2, , xn}
Similarity is defined by the distance of their
corresponding vectors
Advantages
Easy to index
Fast
Rough measure
Intuition: Feature Based Similarity Search
7/28/2019 11 Graph Pattern Mining
61/71
61
Intuition: Feature-Based Similarity Search
Graph (G1)
Substructure
Query (Q)
If graph G containsthe major part of a query
graph Q, G should share
a number of common
features with Q
Given a relaxation ratio,
calculate the maximal
number of features thatcan be missed !
At least one of them
should be contained
Graph (G2)
Feature-Graph Matrix
7/28/2019 11 Graph Pattern Mining
62/71
62
Feature-Graph Matrix
G1 G2 G3 G4 G5
f1 0 1 0 1 1
f2 0 1 0 0 1
f3 1 0 1 1 1f4 1 0 0 0 1
f5 0 0 1 1 0
Assume a query graph has 5 features and at most
2 features to miss due to the relaxation threshold
graphs in database
features
Edge Relaxation Feature Misses
7/28/2019 11 Graph Pattern Mining
63/71
63
Edge RelaxationFeature Misses
If we allow k edges to be relaxed, J is the
maximum number of features to be hit by k
edgesit becomes the maximum coverage
problem
NP-complete
A greedy algorithm exists
We design a heuristic to refine the bound of
feature misses
Jk
J
k
111greedy
Query Processing Framework
7/28/2019 11 Graph Pattern Mining
64/71
64
Query Processing Framework
Three steps in processing approximate graphqueries
Step 1. Index Construction Select small structures as features in a
graph database, and build the feature-
graph matrix between the features and
the graphs in the database
Framework (cont )
7/28/2019 11 Graph Pattern Mining
65/71
65
Framework (cont.)
Step 2. Feature Miss Estimation
Determine the indexed features
belonging to the query graph
Calculate the upper bound of the number
of features that can be missed for an
approximate matching, denoted by J
On the query graph, not the graphdatabase
Framework (cont )
7/28/2019 11 Graph Pattern Mining
66/71
66
Framework (cont.)
Step 3. Query Processing
Use the feature-graph matrix to
calculate the difference in the numberof features between graph G and query
Q, FG FQ
If FG FQ > J, discard G. The remaininggraphs constitute a candidate answer
set
Performance Study
7/28/2019 11 Graph Pattern Mining
67/71
67
Performance Study
Database
Chemical compounds of Anti-Aids Drug fromNCI/NIH, randomly select 10,000 compounds
Query
Randomly select 30 graphs with 16 and 20edges as query graphs
Competitive algorithms
Grafil: Graph Filterour algorithm Edge: use edges only
All: use all the features
Comparison of the Three Algorithms
7/28/2019 11 Graph Pattern Mining
68/71
68
Comparison of the Three Algorithms
edge relaxation
10
100
1000
10000
1 2 3 4
Grafil
Edge
All
#ofcandidates
Summary: Graph Pattern Mining
7/28/2019 11 Graph Pattern Mining
69/71
Summary: Graph Pattern Mining
Graph mining has wide applications
Frequent and closed subgraph mining methods
gSpan and CloseGraph: pattern-growth depth-first
search approach
Graph indexing techniques
Frequent and discriminative subgraphs are high-quality
indexing features
Similarity search in graph databases
Indexing and feature-based matching
Constraint-based graph pattern mining
References (1)
7/28/2019 11 Graph Pattern Mining
70/71
References (1)
T. Asai, et al. Efficient substructure discovery from large semi-structured data, SDM'02
C. Borgelt and M. R. Berthold, Mining molecular fragments: Finding relevant substructures of
molecules, ICDM'02 M. Deshpande, M. Kuramochi, and G. Karypis, Frequent Sub-structure Based Approaches for Classifying
Chemical Compounds, ICDM 2003
M. Deshpande, M. Kuramochi, and G. Karypis. Automated approaches for classifying structures,
BIOKDD'02
L. Dehaspe, H. Toivonen, and R. King. Finding frequent substructures in chemical compounds, KDD'98
C. Faloutsos, K. McCurley, and A. Tomkins, Fast Discovery of 'Connection Subgraphs, KDD'04
L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system, KDD'94 J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from
protein structure graphs, RECOMB04
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraph in the presence of isomorphism,
ICDM'03
H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, Mining Coherent Dense Subgraphs across Massive BiologicalNetworks for Functional Discovery, ISMB'05
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructuresfrom graph data, PKDD'00
C. James, D. Weininger, and J. Delany. Daylight Theory Manual Daylight Version 4.82. Daylight
Chemical Information Systems, Inc., 2003.
G. Jeh, and J. Widom, Mining the Space of Graph Properties, KDD'04
M. Koyuturk, A. Grama, and W. Szpankowski. An efficient algorithm for detecting frequent subgraphs inbiological networks, Bioinformatics, 20:I200--I207, 2004.
References (2)
7/28/2019 11 Graph Pattern Mining
71/71
References (2)
M. Kuramochi and G. Karypis. Frequent subgraph discovery, ICDM'01
M. Kuramochi and G. Karypis, GREW: A Scalable Frequent Subgraph Discovery Algorithm, ICDM04
B. McKay. Practical graph isomorphism. Congressus Numerantium, 30:45--87, 1981. S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. KDD'04
J. Prins, J. Yang, J. Huan, and W. Wang. Spin: Mining maximal frequent subgraphs from graph
databases. KDD'04
D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching,PODS'02
J. R. Ullmann. An algorithm for subgraph isomorphism, J. ACM, 23:31--42, 1976.
N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data,ICDM'02
C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-base graph databases, KDD'04
T. Washio and H. Motoda, State of the art of graph-based data mining, SIGKDD Explorations, 5:59-68,2003
X. Yan and J. Han, gSpan: Graph-Based Substructure Pattern Mining, ICDM'02
X. Yan and J. Han, CloseGraph: Mining Closed Frequent Graph Patterns, KDD'03
X. Yan, P. S. Yu, and J. Han, Graph Indexing: A Frequent Structure-based Approach, SIGMOD'04
X. Yan, X. J. Zhou, and J. Han, Mining Closed Relational Graphs with Connectivity Constraints, KDD'05
X. Yan, P. S. Yu, and J. Han, Substructure Similarity Search in Graph Databases, SIGMOD'05
X. Yan, F. Zhu, J. Han, and P. S. Yu, Searching Substructures with Superimposed Distance, ICDE'06
M. J. Zaki. Efficiently mining frequent trees in a forest, KDD'02
P Zh d J H O G h Q O i i i i L N k " VLDB'10