Post on 22-Jul-2018
transcript
1
Chapter 1
Graph Matching: An Introduction
Graph theory is a branch of mathematics that deals with graphs which are sets of vertices
(or nodes) represented as V(={v1,v2,…,vn}) and the associated set of edges represented by
E(={e1,e2,…,ek}), where ei=<vi,vj>. Graphs are flexible structures that can be used to model
real world entities as well as processes in different domains. The problems in various
domains can be modeled as analogous problems on graphs and solutions to these graph
problems give the solutions to the original problems. This has resulted in widespread use of
graphs in obtaining solutions to problems in various domains. Hence, graph theory has
grown into a significant area of research in mathematics with applications in chemistry,
physics, operations research, social science, biology, computer science etc.
The advances in computer technology and increased applications of graphs have spawned a
renewed interest amongst mathematicians and computer scientists in graph theory. Graphs
have, of late gained significance in solving problems in diverse areas because of the ease of
their representation and manipulation, in computers [Narasingh Deo, 2004]. The problem
of graph matching / similarity of graphs/ graph isomorphism have attracted researchers in
mathematics and computer science from the day Graph theory as discipline has started to
gain importance. The importance of Graph theory and Graph Matching in the current
context is brought out in the ensuing section which also emphasizes the need for
addressing the problem.
1.1 Introduction to Graph Theory and Graph Matching
The paper written by Leonhard Euler on the “Seven Bridges of Konigsberg” and published in
1736 is regarded as the first paper in the history of graph theory [Narasingh Deo, 2004]. In
1878, Sylvester introduced the term Graph in a paper published in the famous scientific
journal “Nature”, where he draws an analogy between "quantic invariants" and "co-
2
variants" of algebra and molecular diagrams and this lead to different applications of Graph
Theory [Sylvester John Joseph, 1878.] The first textbook on graph theory was written
by Dénes Kőnig, and published in 1936 [Tute W T, 2001]. A later textbook by Frank Harary,
published in 1969, was popular, and enabled mathematicians, chemists, electrical engineers
and social scientists to talk to each other [Harary Frank, 1969]. Today there are innumerable
number of books on Graph Theory that describe the fundamentals and applications.
The subject of graph theory had its beginning in recreational math problems but it has now
grown in to a significant area of mathematical research. Basically graph theory is a study of
graphs which mathematically model pair wise relations between objects. In all the domains
where graphs are employed for modeling, the vertices or nodes; model the objects whereas
the edges model the relationship. Graphs by their inherent characteristics have been found
to be versatile tools for applications in science and engineering [Narasingh Deo, 2004]. Their
flexibility and robustness in modeling various scenarios and concepts has led to their
popularity. The edges in the graphs may be undirected, representing bidirectional
relationship or they may be directional representing unidirectional relationship. Graphs
which have only undirected edges are referred to as “Undirected Graphs”, whereas the
graphs which consist of directed edges only are “Directed Graphs or Digraphs”. Further
graphs which consist of both the types of edges are “Mixed Graphs”. Graphs are also
categorized as “Simple Graphs” and “Multi Graphs”. Graphs which do not have self loops
and parallel edges are simple graphs [Chartrand, 2012]. The graphs having self loops and /
or parallel edges are multi graphs. The various definitions and concepts about graphs are
brought out in section 1.1.1.
1.1.1 Definitions and Concepts
This section provides the necessary definitions and introduces the conventions followed in
the thesis.
3
Graph : A graph is an ordered pair G = (V, E) comprising of a set V of vertices or
nodes together with a set E of edges or lines/arcs. The edges may be
directed (asymmetric) or undirected (symmetric).
Vertex set : The set of vertices in a graph is denoted by V (G) or V
Edge set : The set of edges in a graph is denoted by E (G) or E
Degree : The degree of a vertex is the number of edges that are incident to it.
Size : A graph's size is |E|, the number of edges
Order : The order of a graph is |V|, the number of vertices
Distance : The distance d (x, y) in G between two vertices x, y is the length of a
shortest x –y path in G
Eccentricity : The eccentricity, e(x), of the vertex x is the maximum value of d(x, y),
where y is allowed to range over all of the vertices of the graph
OR
The eccentricity, e(x), of the vertex x in a graph G is the distance from x
to the vertex farthest from x i.e. e (x) = max d (x, xi), i
Diameter : The greatest distance between any two vertices in G is the diameter of
G, denoted by diam(G)
Center : A vertex with minimum eccentricity in a graph G is referred to as the
center of graph, such a vertex is also called Central Vertex
Radius : The greatest distance between the central vertex and any other vertex
is the radius of the graph and is denoted rad G. It should be obvious that
rad G ≤ diam G ≤ 2*radG
4
Complete
Graph / clique
If all vertices of G are pair wise adjacent, then G is a complete graph or
clique
Walk : A walk is an alternating sequence of vertices and edges, with each edge
being incident to the vertices immediately preceding and succeeding it
in the sequence. A walk of length k is a non-empty alternating sequence
of k+1 vertices and k edges in G
Trail : A trail is a walk with no repeated edges
Path : A path is a walk with no repeated vertices
Closed Walk : A walk is closed if the initial vertex is also the terminal vertex
Cycle : A cycle is a closed trail with at least one edge and with no repeated
vertices except that the initial vertex is also the terminal vertex
Length of a walk: The length of a walk is the number of edges in the sequence defining
the walk
Connected
Graph
A non-empty graph G is called connected if any two of its vertices are
linked by a path in G. A connected graph is a graph with exactly one
connected component
Undirected
Graph
An undirected graph is one in which all edges have no orientation.
Directed graph : A directed graph or digraph is an ordered pair D = (V, A) where each
edge has a direction
:
:
:
:
5
Adjacent : Two vertices x, y of G are adjacent if <x, y> is an edge in G in other
words two vertices are adjacent if they are incident to a common edge.
Similarly two edges are adjacent if they are incident to a common vertex
Incident : A vertex v is incident with an edge e if v ∈ e that is e is an edge at v
Independent : A set of vertices or edges are independent if no two of its elements are
adjacent
Isomorphic : G1 and G2 are isomorphic, if there exist a bijection φ: V1 → V2 such
that<x, y> ∈ E1⇐⇒ <φ(x), φ(y)> ∈ E2 x, y in V1
Invariant : A mapping taking graphs/ graph parameters as arguments is called a
graph invariant if it assigns equal values to isomorphic graphs
A simple graph : A simple graph is a triple G= (V,E,I), where V and E are disjoint finite sets
and I is an incidence relation such that every element of E is incident
with exactly two distinct elements of V and no two elements of E are
incident to the same pair of elements of V
Connectivity : A graph G has connectivity k if G is k-connected but not (k+1)-
connected. A complete graph on k+1 vertices is defined to have
connectivity k.
Neighbours : The set of neighbours, N(v), of a vertex v is the set of vertices which are
adjacent to v. The degree of a vertex is also the Cardinality of its
neighbour set.
Induced Sub-
Graph :
For a set of vertices X, we use G to denote the induced sub
graph of G whose vertex set is X and whose edge set is the subset
of E(G) consisting of those edges with both ends in X
:
6
These are some of the definitions in graph theory that will be used in this dissertation.
Many efficient representation techniques are available for representing graphs in
computers and are briefly described in the next section.
1.1.2 Computer Representation and Graph Spectra
The availability of robust computer representation scheme for graphs and flexibility of
processing them has furthered the use and applications of graphs. The graphs are
represented by various matrices such as adjacency matrix; incidence matrix, path matrix etc
[Narasingh Deo, 2004]. These matrices characterize the various properties of the graphs
and can be processed by a computer for extracting various specific characteristics of the
graphs. The properties of these matrices can be abstracted using the spectrum of the
matrices represented by sets of eigenvalues and eigenvectors [Chung Fan R K, 1994].
The discipline of spectral graph theory emerged in the 1950s and 1960s. It is a theory of
matrices applicable to graphs and their applications. It has the various features that some of
the results, although purely combinatorial in nature, seem in the present state of
knowledge to be unobtainable without employing the matrices that can describe the
characteristic of the graph it represents. Such properties are referred to as spectral
properties.
Besides graph theoretic research on the relationship between structural and spectral
properties of graphs, another major source of advancement in graph theory was research
in quantum chemistry. But the connections between these two lines of work were not
Edge Induced
Sub-Graph
For a set S of edges, we use G[S] to denote the edge induced sub-
graph of G whose edge set is S and whose vertex set is the
subset of V(G) consisting of those vertices incident with any edge in S.
:
7
discovered until much later. The 1980 monograph “Spectra of Graphs” by Cvetkovid, Doob,
and Sachs summarized nearly all research till the date in the area [Cvetkovid etal, 1980]. In
1988 it was updated by their Survey of Recent Results in the Theory of Graph Spectra
[Cvetkovid etal, 1988]. These concepts and theoretical foundations have formed a sound
basis for development of applications of graphs/graph theory and an important few are
described in section 1.1.3.
1.1.3. Applications of Graphs/Graph Theory
Various problems in Graph theory such as Euler Circuits, Hamiltonian Circuits, 4 Color
problem, Planarity of graphs and Kurtowski’s graphs have been well researched and have
found applications in different domains [Chartrand, 2012]. One of the domains, where
graph theory has found large number of applications is computer science. In computer
science, graphs are used to represent communication networks, data organization,
computational devices, the flow of data/information, computational control flows, objects
to be recognized etc. The graph theoretic approach can be applied to problems in travel,
biology, computer chip design, and many other fields [W1].
Topology is a major application area of graphs. Euler’s formula relating the number of
edges, vertices, and faces of a convex polyhedron was studied and generalized by
Cauchy and L'Huillier, and is at the origin of topology. More than a century after Euler's
paper on the bridges of Königsberg, Cayley was motivated by the study of particular
analytical forms arising from differential calculus namely the trees. The techniques evolved
were mainly concerned with the enumeration of graphs having particular properties
(analogous to finding chemical isomers). Enumerative graph theory then rose from the
results of Cayley and the fundamental results published by Pólya between 1935 and 1937
and the generalization of these by De Bruijn in 1959 [W1]. Cayley linked his results on trees
with the contemporary studies of chemical composition.
8
Graph theory is also used to study molecules in chemistry and physics. In condensed matter
physics, the three dimensional structure of complicated simulated atomic structures can be
studied quantitatively by gathering statistics on graph-theoretic properties related to the
topology of the atoms. In chemistry a graph makes a natural model for a molecule, where
vertices represent atoms and edges represent bonds. This approach is especially used in
computer processing of molecular structures in software systems ranging from chemical
editors to databases. In statistical physics, graphs can represent local connections between
interacting parts of a system, as well as the dynamics of a physical process on such systems.
Graph theory is useful in biology and conservation efforts where a vertex can represent
habitats or regions where certain species reside and the edges represent migration paths,
or movement between the regions. Graph theory is also widely used in sociology as a way,
for example, to measure actors' prestige or to explore diffusion mechanisms.
Graph-theoretic methods, in various forms, have proven particularly useful in linguistics,
since natural languages lend themselves well to discrete structures. Within lexical
semantics, especially as applied to computers, modeling word meaning is easier when a
given word is understood in terms of related words; semantic networks are therefore
important in computational linguistics. Indeed, the usefulness of this area of mathematics
to linguistics has resulted in organizations such as Text, as well as various 'Net' projects,
such as Word Net, Verb Net, and others. [W1]
The development of algorithms to handle graphs is therefore of major interest in computer
science. Algorithms for finding; graph characteristics, shortest path/distances in the graphs,
spanning trees of a graph, articulation points, search of particular nodes etc have been
extensively pursued and many variants are available. One of the very important applications
of graphs is in finding similarity between objects represented by the graphs. This problem
has its roots in determining graph isomorphism. Finding whether two graphs are
isomorphic, leads to checking whether the two, are similar in all respects. Finding object
similarity (akin to graph similarity) has applications in pattern analysis, computer vision,
9
chemical structure analysis, plagiarism detection and the like. These applications are
basically graph matching problems that represent the objects by graphs.
Graph Matching that is finding whether two graphs are equivalent has been an interesting
problem from the day graph theory as a discipline has emerged. The many different
solutions proposed have been able to check isomorphism (graph matching) on a particular
class of graphs only. Two graphs G1 and G2 are isomorphic if and only if there is a
permutation of the labeling of the vertices such that the two graphs are equivalent. The
section 1.2 elaborates on the different variants of graph matching, whereas the different
applications of graph matching are brought out in section 1.3.
1.2 Graph Matching and its Variants
There are several variants of the graph matching problem that are concerned with
isometrics’, or graph similarity [Endika Bengoetxea, 2002]. Different solutions exist to some
of these variants. The different variants of graph matching are depicted in Figure 1.1. The
major categories of graph matching are exact graph matching and inexact graph matching.
Maximum Common Sub-
graph
Attributed Sub-Graph
Matching
Attributed Graph
Matching
Graph Matching
Exact Graph Matching
Inexact Graph Matching
Monomorphism
Sub-Graph isomorphism
Graph isomorphism
Homomorphism
Figure 1.1: Graph Matching
Variants
10
1.2.1 Exact Graph Matching: Given the graphs G1 and G2, exact graph matching implies one-
to-one mapping between the nodes of the two graphs and further the mapping is edge-
preserving which means mapping should be bijective.
In other words, there is a one-to-one correspondence between the nodes and edges of the
graphs G1 and G2. The exact graph matching can be categorized into various sub types
based on some of the characteristics of the graphs and matching as depicted in Figure 1.1.
1.2.1.1 Graph Isomorphism: Graph Isomorphism is a variant of exact graph matching that
deals with similarity of two different graphs. Two graphs G1 and G2 are isomorphic if and
only if there is a permutation of the labeling of the vertices such that the two graphs are
equivalent. More formally the isomorphism of graphs G1 and G2 is the bijection between the
vertex sets of G1 and G2
Such that any two vertices x and y of G1 are adjacent in G1 if and only if Φ(x) and Φ(y) are
adjacent in G2. This kind of bijection is commonly called "edge-preserving bijection", in
accordance with the general notion of isomorphism being a structure-preserving bijection
[W2].
1.2.1.2 Monomorphism / Homomorphism: Monomorphism or Homomorphism is a variety
of exact graph matching where each node of the first graph is mapped to a distinct node of
the second one, and each edge of the first graph has a corresponding edge in the second
one; the second graph, however, may have both extra nodes and extra edges.
A Graph Homomorphism f from a graph G = (V, E) to a graph G’= (V’, E’), written as
GGf : , is a mapping VVf : from the vertex set of G to the vertex set of G’ such
that Evu ),( implies ( Ef(v))f(u), ) but not vice versa.
1.2.1.3 Sub-Graph Isomorphism: Another form of matching is sub-graph isomorphism or
sub graph matching; an isomorphism that holds between one of the two graphs and a
11
node-induced sub-graph of the other is referred to as exact sub graph matching. More
formally given two graphs G1 and G2, one must determine whether the graph G1 contains a
sub graph that is isomorphic to G2 and the problem is proven to be NP Complete [Cook,
1971].
1.2.1.4 Maximum common sub-graph: The mapping of a sub-graph of the first graph to an
isomorphic sub-graph of the second one leads to common sub graph. Since such a mapping
is not uniquely defined, it is required to find the largest sub-graph for which such a mapping
exists and is referred to as maximum common sub-graph.
Actually, there are two possible definitions of the problem, depending on whether node-
induced sub-graphs or plain sub-graphs are used. In the first case, the maximality of the
common sub-graph is referred to the number of nodes, while in the second it is the number
of edges that is maximized. It is widely known that the problem of finding the maximum
common sub graph of two graphs can be reduced to the problem of finding the maximum
clique and has been proven to be NP Hard and NP Complete [W4].
1.2.2 Inexact Graph Matching: When graphs do not have the same number of nodes
and/or edges, but there is a need to measure the degree of similarity of graphs such a
matching of graphs is referred to inexact graph matching. Graph matching of this variety
makes use of attributed or weighted graphs.
1.2.2.1 Attributed Graph Matching: Attributed graphs have some weight/costs associated
with edges/nodes. Inexact matching of such graphs requires us to find graphs which are at
the least distance (most similar) to a given (query) graph. Attributed graph matching implies
establishing correspondence between the nodes of the two graphs as consistently as
possible [Jouili Salim et al, 2009]. This has applications in computer vision, bioinformatics,
chemical analysis and other such domains.
12
1.2.2.2 Attributed Sub Graph Matching: Sub-graph matching on weighted/cost graphs is
referred to as attributed sub- graph matching. Gallagher (2006), describes attributed sub
graph matching as an important variant of inexact graph matching having applications in
computer vision, electronics, computer aided design etc. Kriege and Mutzel (2012), propose
graph kernels for sub graph matching.
The various graph matching variants have found applications in different domains. The
graph matching referred here, is different from matching in graphs, which deals with
matching of vertices in a single graph and is also referred to as “independent edge set”. This
matching of vertices is defined as follows;
Given a graph G=(V,E), a set M of independent edges of G is called a matching. Two edges
are independent if they have no common end vertex [W5]. The matching number denoted
by µ (G) is the maximum size of a matching in G. Matching problems arise in numerous
applications, such as dating services that need to pair up compatible couples. Interns need
to be matched to hospital residency programs and the like. The problem of matching in
graphs (matching vertices) is not addressed but graph matching as regards to the
similarity of graphs (and referred as such in most of the literature) is dealt with in this
thesis.
1.2.3 Comments on Graph Matching Variants and Solutions
The graph matching problems mentioned above are all NP-complete except for graph
isomorphism, for which it has not yet been demonstrated whether it belongs to NP or not.
There are many algorithmic solutions that exist for solving the different problem categories
described above. Some of the algorithms give exact solutions whereas others such as the
neural network, fuzzy approach, and genetic algorithms give approximate solutions.
Polynomial isomorphism algorithms have been developed for special kinds of graphs but no
polynomial algorithms are known for the general case. Hence, exact graph matching has
exponential time complexity in the worst case. However, in many pattern recognition/
13
computer vision applications the actual computation time can be acceptable, because of
the type of graphs encountered and the attributes associated with the graphs. In section 1.5
a few of the state of the art solutions to graph matching from literature are brought out.
The ensuing section 1.3 discusses the applications of graph matching in different domains.
1.3 Applications of Graph Matching
Graphs are used to model many types of relations and process dynamics in physical,
biological and social systems. To identify these systems graph matching techniques can be
employed. A few applications of graph matching in some of the different domains are
described in this section.
Computer Vision: Computer vision involves the study and application of methods which
allow computers to extract the visual information from image data to implement a technical
vision system. Such a vision system has applications in image database searches, video
analysis, biometric identification, forensic applications, military scene analysis etc [Mikhail
Zaslavskiy, 2010]
One of the most prominent application fields of computer vision is medical computer vision
or medical image processing. This area is characterized by the extraction of information
from image data for the purpose of making a medical diagnosis of a patient such as
detection of tumors, arteriosclerosis or other malignant changes. It is also used to measure
organ dimensions, blood flow, etc. Computer Vision has applications in various industrial
scenarios also. Here, the information is collected for the purpose of supporting quality
control in manufacturing process. The final products will be inspected to find defects.
Computer vision is used to detect the enemy soldiers or vehicles and also for guidance of
missiles to a designated target, in defense and other military uses. All of the computer
vision application requires matching acquired image information with known information.
These applications can be efficiently handled by representing the extracted information
14
from the image as a graph and one can employ graph matching to simulate the technical
vision system.
Pattern Recognition: Pattern recognition aims to classify data (patterns) based on either
apriori knowledge or on statistical information extracted from the patterns. The patterns to
be classified are usually groups of measurements or observations, defining points in an
appropriate multidimensional space. As with image, patterns in other data /information can
also be represented as graphs for example, behavior of a person, results etc. [Conte, D, etal,
2004, 2003]. These graphs can be matched to solve many pattern recognition tasks. Hence
graph matching technique has a powerful impact especially in the domain of pattern
recognition. Conte D, etal (2004) presents a review of various applications of graph
matching in pattern recognition and image processing.
Database Systems: In many data base systems, it is necessary to compare two or more data
sets. A new technique to compare partitions of two different data sets with quite a similar
structure frequently occurs in defect detection. The comparison is obtained by dividing
each data set in to partitions by means of a supervised fuzzy clustering algorithm and
associating an undirected complete weighted graph structure to these partitions. Then, a
graph matching operation returns an estimation of the level of similarity between the data
sets [Acciani etal, 2003].
Next-generation database systems dealing with biomedical data, web relationships,
network directories and structured documents often model the data as graphs. With the
rapid increase in the availability of biological, chemical graph datasets, there is a growing
need for effective and efficient graph querying/matching methods.
Chemistry: Chemical graph theory is a branch of mathematics which combines graph theory
and chemistry. Graph theory is used to mathematically model molecules in order to gain
insight into the physical and chemical properties of these compounds. [Susenguth, 1965]
presents an efficient graph matching algorithm for chemical structures. A graph theoretical
15
representation of a molecule gives valuable insight into the chemical phenomena and helps
in the study of its properties and its isomers. Rosenfeld and Klein (2013) propose a graph
matching based technique for enumeration of substitutional isomers.
Electrical and Electronic Systems: Graphs have been used for representing the topology of
electrical and electronic circuits. Graph matching and graph isomorphism are very often
employed for finding the equivalence of circuits [Whitham, 2004], which may be useful in
identifying intellectual property and also in analysis of the circuits. Maiti and Tripathy
(2012) propose a colored graph isomorphism based model for matching electrical circuits.
Graph matching has also been used in motion planning of robots. [Papadimitrou etal, 1994].
Optimization: Graph matching has also been employed in solving various optimization
problems. A graph matching approach is proposed for solving the task assignment problem
encountered in distributed computing systems [Tsai, 1985]. A cost function defined in terms
of a single unit of time, is proposed for evaluating the effectiveness of task assignment. This
cost function represents the maximum time for a task to complete module execution and
communication in all the processors. A new optimization criterion, called the minimax
criterion, is also proposed, based on which both minimization of inter processor
communication and balance of processor loading can be achieved. The proposed approach
allows various system constraints to be included for consideration. Graphs are used to
represent the module relationship of a given task and the processor structure of a
distributed computing system. Module assignment to system processors is transformed into
a type of graph matching, called weak homomorphism. The search of optimal weak
homomorphism corresponding to optimal task assignment is next formulated as a state-
space search problem. It is then solved by the well-known A* algorithm in artificial
intelligence after employing proper heuristic information for speeding up the search. A
distributed joint learning and auction algorithm for target assignment using graph matching
is proposed in [Teymur etal, 2010]. Hence it is found that many an optimization problem is
solved using graph matching.
16
Bio Informatics: Graph matching has been employed in DNA search and various other
bioinformatics applications. One of the interesting applications of the graph matching
problem is the alignment of protein-protein interaction networks. This problem is important
when investigating evolutionary conserved pathways or protein complexes across species,
and to help in the identification of functional orthologs through the detection of conserved
interactions. A phrase-based statistical machine translation decoding problem, are
reformulated as a Traveling Salesman Problem. A new protein binding pocket similarity
measure based on a comparison of 3D atom clouds is proposed in [Mikhail Zaslavskiy,
2010].
All of these applications employ different variants of the graph matching problems, which
are elaborated in section 1.2. The definition of the problem that is explored in this
research/ thesis is elaborated in section 1.4.
1.4 The Problem Definition
The previous sections have presented the graph matching problem and its variants. Section
1.3 has highlighted the various applications of graph matching. In this research the problem
of graph matching (graph isomorphism) is considered using undirected simple connected
graphs.
These types of graphs are very general and have lot of applications viz; computer vision,
pattern recognition, chemical structure analysis, identifying protein complexes etc, and
hence are chosen for experimentation in this work.
To put the problem in proper context it is proposed to develop techniques for finding
whether two graphs G1 and G2 (Synthetic or otherwise) are isomorphic and if they are
isomorphic the techniques will further provide a mapping / correspondence of vertices and
edges between the two graphs.
17
Given two undirected simple graphs G1=(V1,E1) and G2=(V2,E2), the problem being explored
requires to verify whether there exists a bijection Φ, such that
Or
Such that any two vertices x and y of G1 are adjacent in G1 if and only if Φ(x) and Φ(y) are
adjacent in G2. If there is a bijection then the vertex correspondence between the vertices
of the graphs G1 and G2 is to be found, otherwise the methodologies should conclude that
the graphs are not similar/ isomorphic/ matching.
This and other related problems have been explored by many researchers and a few such
reported works are described in the following section.
1.5 Overview of the Solutions from Literature
One of the most fundamental problems in graph theory is verifying graph isomorphism or
graph matching. It has found many applications in various domains. The problems of
determining whether pair of graphs is isomorphic have been extensively studied in the
literature. The solution to graph matching problem has been approached in various ways. A
brief description of the graph matching including the theoretical aspects and a brief survey
of graph matching algorithms is presented in [Bunke, 2000]. A comprehensive survey of
graph matching algorithms applied to pattern recognition and image processing for the
thirty year period preceding 2004 is found in [Conte D, etal, 2004 and 2003].
The question of whether there is a polynomial time algorithm for deciding whether two
graphs are isomorphic, popularly known as the graph isomorphism problem, has been one
of the best known open problems in theoretical computer science for more than forty
years. Remarkably, the problem has first been studied by chemists in the 1950s. Even
though the problem is still open, researchers have obtained a number of substantial partial
18
results. These results rely on a variety of techniques from different branches of theoretical
computer science and discrete mathematics. The various categories of solutions to the
graph matching problem are briefly described in the following sub sections. The various
solutions are classified into the following categories;
General Algorithmic Approaches
Graph Spectral Approaches
Approaches based on Optimization Techniques
Approximate Solutions using Soft Computing and Natural Computation
1.5.1 General Algorithmic Approaches
An algorithm is a problem-solving method suitable for implementation as a computer
program. While designing algorithms a number of different approaches can be employed.
For small problems, it hardly matters which approach is used, as long as it is, one that solves
the problem correctly. However, there are many problems for which the only known
algorithm takes so long to compute the solution that they are practically useless. For
instance, the naïve/ trivial approach of computing all n!, possible permutations of
the n vertices to show that a pair of graphs G1 and G2 are not isomorphic or isomorphic is
impractical even for small inputs. This is because the time complexity of the methodology
is exponential [Cormen etal, 1990].
Polynomial-time algorithms, whose number of computational steps is always bounded by a
polynomial function of the size of the input, are often treated as algorithms of acceptable
time behavior. Thus, a polynomial-time algorithm is one that is actually useful in practice.
The class of all problems that have polynomial-time deterministic algorithms is denoted
by P [Cormen etal, 1990]. The class of problems that have polynomial time non
deterministic algorithms are classified as NP problems, with a special class within it,
referred to as NP-Complete. Indeed, the graph isomorphism problem is one of the very few
natural problems that is neither known to be in P nor known to be NP-complete. Hence
19
researchers are on the look out for finding efficient polynomial algorithms for solving the
graph matching/ graph similarity/ graph isomorphism problem. In the following a few
algorithms to solve the graph matching/ graph isomorphism problem for different types of
graphs are described.
Ashay and Tevet (2009) propose a polynomial time algorithm for solving the graph
isomorphism problem and claim that the problem is in P class. If graphs G1 and G2 are
isomorphic then they must have the same sign frequency vectors in lexicographic
order and the algorithm obtains identical canonical forms of their
sign matrices S1* and S2* in polynomial-time, thus finding an explicit isomorphism
function .
Vorgelegt (2006) discusses classical algorithms especially the different quantum
approaches that have been proposed and gives an overview of the results that are
known in this area
Raffaele Mosca (2002) proposes a graph invariant λ (G), which is a non-negative
integer and is non-zero whenever G contains particular induced odd cycles or,
equivalently, admits a particular minimum clique-partition. An efficient algorithm for
computing λ (G) is also outlined in the aforesaid paper.
Ling Chia and Wai Keong Kok (2002) describe a methodology to determine all
disconnected weakly k homogeneous graphs for k = 2, 3.
Pavol Hell and Jarik Nesetirl (2007) address the question of density of the
homomorphism order for trigraphs, further the gaps in the order are characterized
Xuding Zhu (1992) provides a definition of star chromatic number and also discusses
the star chromatic number from the perspective of graph homomorphism and of
graph products.
Wang and Williams (1991) define the threshold weight of a graph as a measure of
the amount by which the graph differs from being a threshold graph. The paper
20
proposes and proves a theorem that specifies the threshold weight of any triangle
free graph to be a heavy graph. This can be later used for proving isomorphism.
Abdulrahim and Misra (1998) present an algorithm to solve the graph isomorphism
problem for the purpose of object recognition. The algorithm consists of three
phases: preprocessing, link construction, and ambiguity resolution. The algorithm
works for all types of graphs except for a class of highly ambiguous graphs that
includes strongly regular graphs which are detected in polynomial time.
A new backtracking algorithm for testing a pair of digraphs for isomorphism is
presented in [Schmidt and Druffel, 1976]. The algorithm is not guaranteed to run in
polynomial time but performs efficiently for a large class of graphs
The algorithmic approaches presented here are for different classes of graphs and a few of
them run in polynomial time.
1.5.2 Graph Spectral Approaches
Spectral matching is a computationally efficient approach to the approximate solution of
pair wise matching problems that are NP-hard. Polynomial time algorithms for graphs with
different properties, continuous time quantum walks for exact and inexact graphs,
canonical forms, and automorphism are used to determine graph isomorphism. Inexact
spectral matching algorithms that embed large graphs on a low dimensional isometric space
spanned by a set of eigenvectors of the graphs is found in [Emms etal, 2009; Vorgelegt,
2006]. In the following a representative set of works based on spectra is brought out.
Spectral graph matching can be interpreted as a maximum likelihood estimate of the
assignment probabilities and that the graduated assignment algorithm can be cast as
a maximum a posteriori estimator. Based on this analysis a ranking scheme for
spectral matching is derived. Further a novel iterative probabilistic matching
algorithm that relaxes some of the implicit assumption is proposed in [Egozi etal,
2012 ]
21
Cour etal (2006) present a new spectral relaxation technique for approximate
solutions to matching problems that naturally incorporates one to one or one to
many constraints. The paper also describes a normalization procedure for existing
graph matching scoring functions.
Beezer, (1990) provides a recursive construction that allows us to construct trees
and graphs, which have the minimum possible number of distinct eigenvalues and is
related to the amount of symmetry that the graph possesses; the fewer the number
of eigenvalues, the greater the amount of symmetry, this property can be employed
for building graph matching systems
Laplacians of homogeneous graphs and generalized Laplacians whose eigenvalues
are associated with various equilibrium of forces in molecules is described in [Chung
and Sternberg, 1992]
The spectral matrix for the Laplacian can be used to construct symmetric
polynomials that are permutation invariants. The coefficients of these polynomials
can be used as graph features that can be encoded as vectors and further employed
for graph matching as described in [Wilson etal, 2005]
Qui and Hancock (2006), present a technique that exploits the properties of the
commute time for the purpose of graph matching
Nonnegative matrix factorization (NMF) is proposed to solve different data mining
problems including graph matching on undirected and directed graphs in [Ding etal,
2008]
Hogben, (2005) presents a survey on spectral graph theory and the inverse
eigenvalue problem of a graph and examines the connections between these
problems. The paper presents some new results on construction of a matrix of
minimum rank for a given graph having a special form such as 0-1 matrix or a
generalized Laplacian.
22
Zhu and Wilson (2005) analyses the adjacency matrix, combinatorial laplacians,
normalized laplacians and unsigned laplacians and present a study on use of
spectrum for heat kernel matrix and path length distribution matrix.
Zahidi, (2007) employed algebraic invariants including both eigenvalues and
eigenvectors to show that the permutation is related to the eigenvector of both
graphs when they posses non-degenerate eigenvalues. In the paper a new
technique that employs non degenerate eigenvalues and corresponding
eigenvectors for detecting isomorphic graphs is presented.
Damien etal, (2008) introduce a new metric, namely weighted spectral distribution
that improves on the spectrum by discounting the eigenvalues believed to be
unimportant and emphasizing those which are important, for comparing graphs.
The problem of graph matching can be posed as maximum likelihood estimation
using the apparatus of EM algorithm and to cast the recovery of correspondence
matches between the graph nodes in a matrix framework [ Luo and Hancock,2002].
Umeyama Shinji (1988) presents an approximate solution to weighted graph
matching problem. It employs an analytic approach based on the eigen-
decomposition of the adjacency matrix of a graph and almost always gives the
optimum matching when a pair of graphs is nearly isomorphic.
A new methodology was developed by He, etal (2003) for detecting graph
isomorphism using the concept of quadratic form. Graphs are represented first by
quadratic form, and the comparison of two graphs is thus reduced to the comparison
of two quadratic form expressions. If both the lengths and the directions of the semi
axes of quadric surfaces that are characterized by the eigenvalues and eigenvectors
are the same, the associated graphs are isomorphic. An algorithm is developed
based on this idea, and tested for the counter-examples known to other methods
Robles-Kelly and Hancock, (2002) describe a spectral method for graph matching;
the method makes use of a brushfire search procedure to find correspondence. The
node order of the steady state random walk associated with Markov chain is
23
determined by the coefficient order of the leading eigenvector of the adjacency
matrix.
Gori etal, (2004) propose a novel polynomial time approximation algorithm that uses
random walks to compute the topological features for each node to identify vertex
correspondence.
Most of the applications of graph matching make do with inexact graph matching. Hence
these applications need error tolerant graph matching. Many of the error tolerant matching
techniques make use of edit distances between graphs. In the following a few such
methodologies are summarized.
Three novel methods to compute the upper and lower points for the edit distance
between two graphs in polynomial time is proposed in [ Zhiping etal, 2009].
Robles-Kelly and Hancock, (2003) propose a graph spectral seriation method to
convert adjacency matrix to string or sequence order. The edit distance is computed
by finding the sequence of string edit operations which minimize the cost of the path
traversing the edit lattice, which is employed for inexact graph matching.
Influence of the cost function on the optimal match between two graphs, is studied
in [Bunke, 1999]. The paper also shows that graph isomorphism, sub graph
isomorphism and maximum common sub graph problems are special cases of
optimal error correcting graph matching under particular cost function.
1.5.3 Optimization
Researchers have modeled the graph matching as assignment problem or other
optimization problems and have proposed a few solutions for the same. A few of them are
briefly summarized in the following;
Gold and Rangarajan, (1996) propose a graduated assignment graph matching
algorithm which addresses many problems and works on, weighted or attributed
relational graphs with missing or extra link nodes and addresses problems such as
24
sub graph isomorphism, weighted graph matching and attributed relational graph
matching.
A linear program is obtained by formulating the weighted graph matching problem
in L1 norm and then transforming the resulting quadratic optimization problem to a
linear one. The simplex method augmented by 0-1 integer solution of the Hungarian
method is used to verify graph matching [Almohammad and Duffua, 1993]
Most of the methods in this genre work on weighted attributed graphs.
1.5.4 Approximate Solutions to Graph Matching using Soft Computing and
Natural Computation
Recently the advances in soft computing and natural computation paradigms have made
available tools which can gainfully be employed at least in finding approximate solution to
the graph matching problems. A few representative approaches are enumerated here.
Cross etal (1997) describes a framework for performing relational graph matching
using genetic search on attributed graphs. There are three novel ingredients to the
work. Firstly, the optimization process is cast into a Bayesian framework by
exploiting the recently reported global consistency measure of Wilson and Hancock
as a fitness measure. The second novel idea is to realize the crossover process at the
level of sub graphs, rather than employing string-based or random crossover. Finally,
convergence is accelerated by employing a deterministic hill-climbing process prior
to selection.
Jain and Wystozki, (2005), propose a neural network approach to solve exact and
inexact graph isomorphism problems for weighted graphs. The method is based on a
neural refinement procedure to reduce the search space followed by an energy
minimizing matching process.
Recently there is spurt of reported methodologies which make use of soft and
evolutionary computing paradigms. Most of the soft and natural computation
25
approaches work on attributed graphs. All of the literature reported in this section point
to the fact the problem of graph matching/ graph similarity is still an open problem. And
this has motivated this research work. The motivating factors that lead to the current
research work is brought out in section 1.6.
1.6 Motivation for Current Work
The study and exploration of works in graph theory, especially those for graph matching/
graph similarity motivated us to further delve into the details. A detailed look into the graph
matching problem and its variants evinced sufficient interest. The diversity of applications
which are bought out in section 1.3 urged us to take up work in this direction. Out of
various graph type’s simple undirected graphs have been found to have large number of
applications. Development of efficient algorithms for solving the graph matching problem
on simple undirected graphs will have an impact on various applications areas. Hence the
problem was selected. Further the problem has theoretical and algorithmic issues which
stimulate the mind. With these points, the problem of solutions to graph matching on
undirected simple graphs was selected and explored. The brief description of the new
techniques proposed in this research is given in section 1.7.
1.7 Overview of Proposed Solutions and Contributions
The work for development of efficient techniques started with using the concept of
distance from a vertex to every other vertex in a connected graph to represent the
complete characteristics of the graph. It is proposed to use the distances between vertices
of the graphs and other spectral characteristics for ascertaining the similarity of graphs.
Further spectral, assignment and natural computation techniques have been employed for
establishing the correspondences between the vertices and edges between similar/
isomorphic graphs. Four different approaches have been proposed for the purpose. The
techniques have been tested on a large variety of synthetic graphs and have been shown to
perform excellently.
26
The first technique uses spectral properties namely the principal eigenvector, degree of
vertices and average shortest distance from a vertex to all other vertices to check for
similarity and establishing correspondence between the vertices. This technique employs
the adjacency matrix representation of the graphs. The eigenvalues and the eigenvectors of
the two adjacency matrices are obtained. The vertices of the two graphs are ordered based
on the coefficients of the eigenvector corresponding to the largest eigenvalue. This order is
used for mapping of vertices and further their correspondence is verified by checking for
equality in vertex degree and average shortest distance to the other vertices, if the check is
positive the two graphs are similar/ isomorphic otherwise the two graphs do not match.
The methodology has been tested and is found to be accurate and robust.
A new theorem and a corollary that propose the invariance of degree and average shortest
distance are necessary and sufficient condition for isomorphism of path graphs is presented
and proved. The methodology for other simple undirected graphs is based on this
theorem/corollary. Further the vertex correspondence is proposed based on the rank order
of vertices obtained by the non increasing order of the coefficients of principal eigenvector.
The second technique also uses spectral properties but employs the normalized adjacency
matrix representation of the graphs. The similarity of the graphs is ascertained by the co-
spectral nature of the two graphs along with degree invariance and invariance of the sum of
shortest distances to other vertices. The vertex correspondence of the graphs is established
by the matching values of the least eigenvector (eigenvector corresponding to the least
eigenvalue). The coefficients of the eigenvector corresponding to the least eigenvalue are
used to map the vertices. If there are less than n/2 (empirical value) matches of the
coefficients, then one of the eigenvector is reversed in sign and mapping is carried out.
The third technique proposed in this work establishes the similarity between the graphs,
using the matching of vertex degrees and invariance of the sum of shortest distances to all
other vertices in the graph. Once the similarity is established, the vertex and edge
correspondence is obtained by solving an ingeniously set up assignment problem, which is
27
solved by the Hungarian approach [W6]. The assignment problem is set up based on the
theorem of vertex correspondence between two isomorphic graphs, proposed and proved
in this work. The methodology is found to be robust and has produced accurate results.
The fourth technique modifies the third technique by solving the correspondence problem
using natural computation technique namely the genetic algorithm. It again establishes
similarity between the graphs by making use of the matching in vertex degrees and the sum
of the shortest distances to all other vertices. Later the problem of vertex and edge
correspondence is solved by proposing new chromosome structure, crossover and mutation
operators in the natural computation process using simple and steady state genetic
algorithms. The technique employs an ingenious fitness function that is based on a corollary
to theorem of vertex correspondence and searches the fittest individual chromosome as a
solution to the correspondence problem. Both steady state GA and simple GA are
implemented using the same chromosome structure, reproduction operators (crossover
and mutation) and the fitness function. The work has produced accurate results.
In all the four techniques described above; the results are satisfactory and theoretical basis
has been established for their performance. These techniques add to the vast repertoire of
techniques to ascertain graph matching and are proved to be efficient. They have
applications in variety of domains. The proposed techniques are also found to be efficient
as they have polynomial time complexity and have produced accurate result on counter
examples for graph isomorphism checking. The next section describes the organization of
the thesis.
1.8 Organization of the Thesis
The thesis is organized into six chapters. The first chapter gives an introduction to graph
theory, its applications and puts in context the graph matching problem. The chapter also
describes the different variants of the graph matching problem and gives the state of the
art techniques for graph matching. The problem definition and proposed techniques are
28
also briefly presented. Chapter 2 describes the spectral and average shortest distance based
approach to graph matching. The methodology is described in detail and thorough
experimentation is reported. The theoretical basis for the approach is also provided, by
presenting theorem, corollary and their proofs.
Chapter three introduces another spectral technique using normalized adjacency matrix
representation and sum of shortest distances to other vertices from a vertex along with
vertex degree as invariants for graph matching. The detailed algorithm is described,
theoretical basis is provided and results are enunciated. The chapter four describes the
assignment technique for establishing correspondence between the vertices of the two
isomorphic graphs along with a theorem for vertex correspondence of isomorphic graphs.
The results are brought out in detail. The chapter 5 describes the genetic algorithm
technique for graph matching. Chapter 6 gives a comparative study of the four techniques
proposed, presents the time complexities and summarizes the research work.
1.9 Summary
The chapter has given a brief introduction to graph theory and its applications. The graph
matching problem was highlighted and its different variants are introduced. The
applications of graph matching in various domains are presented. The problem definition
and the motivation for the work are enunciated. Further the various proposed solutions and
organization of the thesis are also described.