Chapter 1 Graph Matching: An Introduction -...

transcript

Chapter 1

Graph Matching: An Introduction

Graph theory is a branch of mathematics that deals with graphs which are sets of vertices

(or nodes) represented as V(={v1,v2,…,vn}) and the associated set of edges represented by

E(={e1,e2,…,ek}), where ei=<vi,vj>. Graphs are flexible structures that can be used to model

real world entities as well as processes in different domains. The problems in various

domains can be modeled as analogous problems on graphs and solutions to these graph

problems give the solutions to the original problems. This has resulted in widespread use of

graphs in obtaining solutions to problems in various domains. Hence, graph theory has

grown into a significant area of research in mathematics with applications in chemistry,

physics, operations research, social science, biology, computer science etc.

The advances in computer technology and increased applications of graphs have spawned a

renewed interest amongst mathematicians and computer scientists in graph theory. Graphs

have, of late gained significance in solving problems in diverse areas because of the ease of

their representation and manipulation, in computers [Narasingh Deo, 2004]. The problem

of graph matching / similarity of graphs/ graph isomorphism have attracted researchers in

mathematics and computer science from the day Graph theory as discipline has started to

gain importance. The importance of Graph theory and Graph Matching in the current

context is brought out in the ensuing section which also emphasizes the need for

addressing the problem.

1.1 Introduction to Graph Theory and Graph Matching

The paper written by Leonhard Euler on the “Seven Bridges of Konigsberg” and published in

1736 is regarded as the first paper in the history of graph theory [Narasingh Deo, 2004]. In

1878, Sylvester introduced the term Graph in a paper published in the famous scientific

journal “Nature”, where he draws an analogy between "quantic invariants" and "co-

variants" of algebra and molecular diagrams and this lead to different applications of Graph

Theory [Sylvester John Joseph, 1878.] The first textbook on graph theory was written

by Dénes Kőnig, and published in 1936 [Tute W T, 2001]. A later textbook by Frank Harary,

published in 1969, was popular, and enabled mathematicians, chemists, electrical engineers

and social scientists to talk to each other [Harary Frank, 1969]. Today there are innumerable

number of books on Graph Theory that describe the fundamentals and applications.

The subject of graph theory had its beginning in recreational math problems but it has now

grown in to a significant area of mathematical research. Basically graph theory is a study of

graphs which mathematically model pair wise relations between objects. In all the domains

where graphs are employed for modeling, the vertices or nodes; model the objects whereas

the edges model the relationship. Graphs by their inherent characteristics have been found

to be versatile tools for applications in science and engineering [Narasingh Deo, 2004]. Their

flexibility and robustness in modeling various scenarios and concepts has led to their

popularity. The edges in the graphs may be undirected, representing bidirectional

relationship or they may be directional representing unidirectional relationship. Graphs

which have only undirected edges are referred to as “Undirected Graphs”, whereas the

graphs which consist of directed edges only are “Directed Graphs or Digraphs”. Further

graphs which consist of both the types of edges are “Mixed Graphs”. Graphs are also

categorized as “Simple Graphs” and “Multi Graphs”. Graphs which do not have self loops

and parallel edges are simple graphs [Chartrand, 2012]. The graphs having self loops and /

or parallel edges are multi graphs. The various definitions and concepts about graphs are

brought out in section 1.1.1.

1.1.1 Definitions and Concepts

This section provides the necessary definitions and introduces the conventions followed in

the thesis.

Graph : A graph is an ordered pair G = (V, E) comprising of a set V of vertices or

nodes together with a set E of edges or lines/arcs. The edges may be

directed (asymmetric) or undirected (symmetric).

Vertex set : The set of vertices in a graph is denoted by V (G) or V

Edge set : The set of edges in a graph is denoted by E (G) or E

Degree : The degree of a vertex is the number of edges that are incident to it.

Size : A graph's size is |E|, the number of edges

Order : The order of a graph is |V|, the number of vertices

Distance : The distance d (x, y) in G between two vertices x, y is the length of a

shortest x –y path in G

Eccentricity : The eccentricity, e(x), of the vertex x is the maximum value of d(x, y),

where y is allowed to range over all of the vertices of the graph

The eccentricity, e(x), of the vertex x in a graph G is the distance from x

to the vertex farthest from x i.e. e (x) = max d (x, xi), i

Diameter : The greatest distance between any two vertices in G is the diameter of

G, denoted by diam(G)

Center : A vertex with minimum eccentricity in a graph G is referred to as the

center of graph, such a vertex is also called Central Vertex

Radius : The greatest distance between the central vertex and any other vertex

is the radius of the graph and is denoted rad G. It should be obvious that

rad G ≤ diam G ≤ 2*radG

Complete

Graph / clique

If all vertices of G are pair wise adjacent, then G is a complete graph or

clique

Walk : A walk is an alternating sequence of vertices and edges, with each edge

being incident to the vertices immediately preceding and succeeding it

in the sequence. A walk of length k is a non-empty alternating sequence

of k+1 vertices and k edges in G

Trail : A trail is a walk with no repeated edges

Path : A path is a walk with no repeated vertices

Closed Walk : A walk is closed if the initial vertex is also the terminal vertex

Cycle : A cycle is a closed trail with at least one edge and with no repeated

vertices except that the initial vertex is also the terminal vertex

Length of a walk: The length of a walk is the number of edges in the sequence defining

the walk

Connected

A non-empty graph G is called connected if any two of its vertices are

linked by a path in G. A connected graph is a graph with exactly one

connected component

Undirected

An undirected graph is one in which all edges have no orientation.

Directed graph : A directed graph or digraph is an ordered pair D = (V, A) where each

edge has a direction

Adjacent : Two vertices x, y of G are adjacent if <x, y> is an edge in G in other

words two vertices are adjacent if they are incident to a common edge.

Similarly two edges are adjacent if they are incident to a common vertex

Incident : A vertex v is incident with an edge e if v ∈ e that is e is an edge at v

Independent : A set of vertices or edges are independent if no two of its elements are

adjacent

Isomorphic : G1 and G2 are isomorphic, if there exist a bijection φ: V1 → V2 such

that<x, y> ∈ E1⇐⇒ <φ(x), φ(y)> ∈ E2 x, y in V1

Invariant : A mapping taking graphs/ graph parameters as arguments is called a

graph invariant if it assigns equal values to isomorphic graphs

A simple graph : A simple graph is a triple G= (V,E,I), where V and E are disjoint finite sets

and I is an incidence relation such that every element of E is incident

with exactly two distinct elements of V and no two elements of E are

incident to the same pair of elements of V

Connectivity : A graph G has connectivity k if G is k-connected but not (k+1)-

connected. A complete graph on k+1 vertices is defined to have

connectivity k.

Neighbours : The set of neighbours, N(v), of a vertex v is the set of vertices which are

adjacent to v. The degree of a vertex is also the Cardinality of its

neighbour set.

Induced Sub-

Graph :

For a set of vertices X, we use G to denote the induced sub

graph of G whose vertex set is X and whose edge set is the subset

of E(G) consisting of those edges with both ends in X

These are some of the definitions in graph theory that will be used in this dissertation.

Many efficient representation techniques are available for representing graphs in

computers and are briefly described in the next section.

1.1.2 Computer Representation and Graph Spectra

The availability of robust computer representation scheme for graphs and flexibility of

processing them has furthered the use and applications of graphs. The graphs are

represented by various matrices such as adjacency matrix; incidence matrix, path matrix etc

[Narasingh Deo, 2004]. These matrices characterize the various properties of the graphs

and can be processed by a computer for extracting various specific characteristics of the

graphs. The properties of these matrices can be abstracted using the spectrum of the

matrices represented by sets of eigenvalues and eigenvectors [Chung Fan R K, 1994].

The discipline of spectral graph theory emerged in the 1950s and 1960s. It is a theory of

matrices applicable to graphs and their applications. It has the various features that some of

the results, although purely combinatorial in nature, seem in the present state of

knowledge to be unobtainable without employing the matrices that can describe the

characteristic of the graph it represents. Such properties are referred to as spectral

properties.

Besides graph theoretic research on the relationship between structural and spectral

properties of graphs, another major source of advancement in graph theory was research

in quantum chemistry. But the connections between these two lines of work were not

Edge Induced

Sub-Graph

For a set S of edges, we use G[S] to denote the edge induced sub-

graph of G whose edge set is S and whose vertex set is the

subset of V(G) consisting of those vertices incident with any edge in S.

discovered until much later. The 1980 monograph “Spectra of Graphs” by Cvetkovid, Doob,

and Sachs summarized nearly all research till the date in the area [Cvetkovid etal, 1980]. In

1988 it was updated by their Survey of Recent Results in the Theory of Graph Spectra

[Cvetkovid etal, 1988]. These concepts and theoretical foundations have formed a sound

basis for development of applications of graphs/graph theory and an important few are

described in section 1.1.3.

1.1.3. Applications of Graphs/Graph Theory

Various problems in Graph theory such as Euler Circuits, Hamiltonian Circuits, 4 Color

problem, Planarity of graphs and Kurtowski’s graphs have been well researched and have

found applications in different domains [Chartrand, 2012]. One of the domains, where

graph theory has found large number of applications is computer science. In computer

science, graphs are used to represent communication networks, data organization,

computational devices, the flow of data/information, computational control flows, objects

to be recognized etc. The graph theoretic approach can be applied to problems in travel,

biology, computer chip design, and many other fields [W1].

Topology is a major application area of graphs. Euler’s formula relating the number of

edges, vertices, and faces of a convex polyhedron was studied and generalized by

Cauchy and L'Huillier, and is at the origin of topology. More than a century after Euler's

paper on the bridges of Königsberg, Cayley was motivated by the study of particular

analytical forms arising from differential calculus namely the trees. The techniques evolved

were mainly concerned with the enumeration of graphs having particular properties

(analogous to finding chemical isomers). Enumerative graph theory then rose from the

results of Cayley and the fundamental results published by Pólya between 1935 and 1937

and the generalization of these by De Bruijn in 1959 [W1]. Cayley linked his results on trees

with the contemporary studies of chemical composition.

Graph theory is also used to study molecules in chemistry and physics. In condensed matter

physics, the three dimensional structure of complicated simulated atomic structures can be

studied quantitatively by gathering statistics on graph-theoretic properties related to the

topology of the atoms. In chemistry a graph makes a natural model for a molecule, where

vertices represent atoms and edges represent bonds. This approach is especially used in

computer processing of molecular structures in software systems ranging from chemical

editors to databases. In statistical physics, graphs can represent local connections between

interacting parts of a system, as well as the dynamics of a physical process on such systems.

Graph theory is useful in biology and conservation efforts where a vertex can represent

habitats or regions where certain species reside and the edges represent migration paths,

or movement between the regions. Graph theory is also widely used in sociology as a way,

for example, to measure actors' prestige or to explore diffusion mechanisms.

Graph-theoretic methods, in various forms, have proven particularly useful in linguistics,

since natural languages lend themselves well to discrete structures. Within lexical

semantics, especially as applied to computers, modeling word meaning is easier when a

given word is understood in terms of related words; semantic networks are therefore

important in computational linguistics. Indeed, the usefulness of this area of mathematics

to linguistics has resulted in organizations such as Text, as well as various 'Net' projects,

such as Word Net, Verb Net, and others. [W1]

The development of algorithms to handle graphs is therefore of major interest in computer

science. Algorithms for finding; graph characteristics, shortest path/distances in the graphs,

spanning trees of a graph, articulation points, search of particular nodes etc have been

extensively pursued and many variants are available. One of the very important applications

of graphs is in finding similarity between objects represented by the graphs. This problem

has its roots in determining graph isomorphism. Finding whether two graphs are

isomorphic, leads to checking whether the two, are similar in all respects. Finding object

similarity (akin to graph similarity) has applications in pattern analysis, computer vision,

chemical structure analysis, plagiarism detection and the like. These applications are

basically graph matching problems that represent the objects by graphs.

Graph Matching that is finding whether two graphs are equivalent has been an interesting

problem from the day graph theory as a discipline has emerged. The many different

solutions proposed have been able to check isomorphism (graph matching) on a particular

class of graphs only. Two graphs G1 and G2 are isomorphic if and only if there is a

permutation of the labeling of the vertices such that the two graphs are equivalent. The

section 1.2 elaborates on the different variants of graph matching, whereas the different

applications of graph matching are brought out in section 1.3.

1.2 Graph Matching and its Variants

There are several variants of the graph matching problem that are concerned with

isometrics’, or graph similarity [Endika Bengoetxea, 2002]. Different solutions exist to some

of these variants. The different variants of graph matching are depicted in Figure 1.1. The

major categories of graph matching are exact graph matching and inexact graph matching.

Maximum Common Sub-

Attributed Sub-Graph

Matching

Attributed Graph

Matching

Graph Matching

Exact Graph Matching

Inexact Graph Matching

Monomorphism

Sub-Graph isomorphism

Graph isomorphism

Homomorphism

Figure 1.1: Graph Matching

Variants

1.2.1 Exact Graph Matching: Given the graphs G1 and G2, exact graph matching implies one-

to-one mapping between the nodes of the two graphs and further the mapping is edge-

preserving which means mapping should be bijective.

In other words, there is a one-to-one correspondence between the nodes and edges of the

graphs G1 and G2. The exact graph matching can be categorized into various sub types

based on some of the characteristics of the graphs and matching as depicted in Figure 1.1.

1.2.1.1 Graph Isomorphism: Graph Isomorphism is a variant of exact graph matching that

deals with similarity of two different graphs. Two graphs G1 and G2 are isomorphic if and

only if there is a permutation of the labeling of the vertices such that the two graphs are

equivalent. More formally the isomorphism of graphs G1 and G2 is the bijection between the

vertex sets of G1 and G2

Such that any two vertices x and y of G1 are adjacent in G1 if and only if Φ(x) and Φ(y) are

adjacent in G2. This kind of bijection is commonly called "edge-preserving bijection", in

accordance with the general notion of isomorphism being a structure-preserving bijection

1.2.1.2 Monomorphism / Homomorphism: Monomorphism or Homomorphism is a variety

of exact graph matching where each node of the first graph is mapped to a distinct node of

the second one, and each edge of the first graph has a corresponding edge in the second

one; the second graph, however, may have both extra nodes and extra edges.

A Graph Homomorphism f from a graph G = (V, E) to a graph G’= (V’, E’), written as

GGf : , is a mapping VVf : from the vertex set of G to the vertex set of G’ such

that Evu ),( implies ( Ef(v))f(u), ) but not vice versa.

1.2.1.3 Sub-Graph Isomorphism: Another form of matching is sub-graph isomorphism or

sub graph matching; an isomorphism that holds between one of the two graphs and a

node-induced sub-graph of the other is referred to as exact sub graph matching. More

formally given two graphs G1 and G2, one must determine whether the graph G1 contains a

sub graph that is isomorphic to G2 and the problem is proven to be NP Complete [Cook,

1971].

1.2.1.4 Maximum common sub-graph: The mapping of a sub-graph of the first graph to an

isomorphic sub-graph of the second one leads to common sub graph. Since such a mapping

is not uniquely defined, it is required to find the largest sub-graph for which such a mapping

exists and is referred to as maximum common sub-graph.

Actually, there are two possible definitions of the problem, depending on whether node-

induced sub-graphs or plain sub-graphs are used. In the first case, the maximality of the

common sub-graph is referred to the number of nodes, while in the second it is the number

of edges that is maximized. It is widely known that the problem of finding the maximum

common sub graph of two graphs can be reduced to the problem of finding the maximum

clique and has been proven to be NP Hard and NP Complete [W4].

1.2.2 Inexact Graph Matching: When graphs do not have the same number of nodes

and/or edges, but there is a need to measure the degree of similarity of graphs such a

matching of graphs is referred to inexact graph matching. Graph matching of this variety

makes use of attributed or weighted graphs.

1.2.2.1 Attributed Graph Matching: Attributed graphs have some weight/costs associated

with edges/nodes. Inexact matching of such graphs requires us to find graphs which are at

the least distance (most similar) to a given (query) graph. Attributed graph matching implies

establishing correspondence between the nodes of the two graphs as consistently as

possible [Jouili Salim et al, 2009]. This has applications in computer vision, bioinformatics,

chemical analysis and other such domains.

1.2.2.2 Attributed Sub Graph Matching: Sub-graph matching on weighted/cost graphs is

referred to as attributed sub- graph matching. Gallagher (2006), describes attributed sub

graph matching as an important variant of inexact graph matching having applications in

computer vision, electronics, computer aided design etc. Kriege and Mutzel (2012), propose

graph kernels for sub graph matching.

The various graph matching variants have found applications in different domains. The

graph matching referred here, is different from matching in graphs, which deals with

matching of vertices in a single graph and is also referred to as “independent edge set”. This

matching of vertices is defined as follows;

Given a graph G=(V,E), a set M of independent edges of G is called a matching. Two edges

are independent if they have no common end vertex [W5]. The matching number denoted

by µ (G) is the maximum size of a matching in G. Matching problems arise in numerous

applications, such as dating services that need to pair up compatible couples. Interns need

to be matched to hospital residency programs and the like. The problem of matching in

graphs (matching vertices) is not addressed but graph matching as regards to the

similarity of graphs (and referred as such in most of the literature) is dealt with in this

thesis.

1.2.3 Comments on Graph Matching Variants and Solutions

The graph matching problems mentioned above are all NP-complete except for graph

isomorphism, for which it has not yet been demonstrated whether it belongs to NP or not.

There are many algorithmic solutions that exist for solving the different problem categories

described above. Some of the algorithms give exact solutions whereas others such as the

neural network, fuzzy approach, and genetic algorithms give approximate solutions.

Polynomial isomorphism algorithms have been developed for special kinds of graphs but no

polynomial algorithms are known for the general case. Hence, exact graph matching has

exponential time complexity in the worst case. However, in many pattern recognition/

computer vision applications the actual computation time can be acceptable, because of

the type of graphs encountered and the attributes associated with the graphs. In section 1.5

a few of the state of the art solutions to graph matching from literature are brought out.

The ensuing section 1.3 discusses the applications of graph matching in different domains.

1.3 Applications of Graph Matching

Graphs are used to model many types of relations and process dynamics in physical,

biological and social systems. To identify these systems graph matching techniques can be

employed. A few applications of graph matching in some of the different domains are

described in this section.

Computer Vision: Computer vision involves the study and application of methods which

allow computers to extract the visual information from image data to implement a technical

vision system. Such a vision system has applications in image database searches, video

analysis, biometric identification, forensic applications, military scene analysis etc [Mikhail

Zaslavskiy, 2010]

One of the most prominent application fields of computer vision is medical computer vision

or medical image processing. This area is characterized by the extraction of information

from image data for the purpose of making a medical diagnosis of a patient such as

detection of tumors, arteriosclerosis or other malignant changes. It is also used to measure

organ dimensions, blood flow, etc. Computer Vision has applications in various industrial

scenarios also. Here, the information is collected for the purpose of supporting quality

control in manufacturing process. The final products will be inspected to find defects.

Computer vision is used to detect the enemy soldiers or vehicles and also for guidance of

missiles to a designated target, in defense and other military uses. All of the computer

vision application requires matching acquired image information with known information.

These applications can be efficiently handled by representing the extracted information

from the image as a graph and one can employ graph matching to simulate the technical

vision system.

Pattern Recognition: Pattern recognition aims to classify data (patterns) based on either

apriori knowledge or on statistical information extracted from the patterns. The patterns to

be classified are usually groups of measurements or observations, defining points in an

appropriate multidimensional space. As with image, patterns in other data /information can

also be represented as graphs for example, behavior of a person, results etc. [Conte, D, etal,

2004, 2003]. These graphs can be matched to solve many pattern recognition tasks. Hence

graph matching technique has a powerful impact especially in the domain of pattern

recognition. Conte D, etal (2004) presents a review of various applications of graph

matching in pattern recognition and image processing.

Database Systems: In many data base systems, it is necessary to compare two or more data

sets. A new technique to compare partitions of two different data sets with quite a similar

structure frequently occurs in defect detection. The comparison is obtained by dividing

each data set in to partitions by means of a supervised fuzzy clustering algorithm and

associating an undirected complete weighted graph structure to these partitions. Then, a

graph matching operation returns an estimation of the level of similarity between the data

sets [Acciani etal, 2003].

Next-generation database systems dealing with biomedical data, web relationships,

network directories and structured documents often model the data as graphs. With the

rapid increase in the availability of biological, chemical graph datasets, there is a growing

need for effective and efficient graph querying/matching methods.

Chemistry: Chemical graph theory is a branch of mathematics which combines graph theory

and chemistry. Graph theory is used to mathematically model molecules in order to gain

insight into the physical and chemical properties of these compounds. [Susenguth, 1965]

presents an efficient graph matching algorithm for chemical structures. A graph theoretical

representation of a molecule gives valuable insight into the chemical phenomena and helps

in the study of its properties and its isomers. Rosenfeld and Klein (2013) propose a graph

matching based technique for enumeration of substitutional isomers.

Electrical and Electronic Systems: Graphs have been used for representing the topology of

electrical and electronic circuits. Graph matching and graph isomorphism are very often

employed for finding the equivalence of circuits [Whitham, 2004], which may be useful in

identifying intellectual property and also in analysis of the circuits. Maiti and Tripathy

(2012) propose a colored graph isomorphism based model for matching electrical circuits.

Graph matching has also been used in motion planning of robots. [Papadimitrou etal, 1994].

Optimization: Graph matching has also been employed in solving various optimization

problems. A graph matching approach is proposed for solving the task assignment problem

encountered in distributed computing systems [Tsai, 1985]. A cost function defined in terms

of a single unit of time, is proposed for evaluating the effectiveness of task assignment. This

cost function represents the maximum time for a task to complete module execution and

communication in all the processors. A new optimization criterion, called the minimax

criterion, is also proposed, based on which both minimization of inter processor

communication and balance of processor loading can be achieved. The proposed approach

allows various system constraints to be included for consideration. Graphs are used to

represent the module relationship of a given task and the processor structure of a

distributed computing system. Module assignment to system processors is transformed into

a type of graph matching, called weak homomorphism. The search of optimal weak

homomorphism corresponding to optimal task assignment is next formulated as a state-

space search problem. It is then solved by the well-known A* algorithm in artificial

intelligence after employing proper heuristic information for speeding up the search. A

distributed joint learning and auction algorithm for target assignment using graph matching

is proposed in [Teymur etal, 2010]. Hence it is found that many an optimization problem is

solved using graph matching.

Bio Informatics: Graph matching has been employed in DNA search and various other

bioinformatics applications. One of the interesting applications of the graph matching

problem is the alignment of protein-protein interaction networks. This problem is important

when investigating evolutionary conserved pathways or protein complexes across species,

and to help in the identification of functional orthologs through the detection of conserved

interactions. A phrase-based statistical machine translation decoding problem, are

reformulated as a Traveling Salesman Problem. A new protein binding pocket similarity

measure based on a comparison of 3D atom clouds is proposed in [Mikhail Zaslavskiy,

2010].

All of these applications employ different variants of the graph matching problems, which

are elaborated in section 1.2. The definition of the problem that is explored in this

research/ thesis is elaborated in section 1.4.

1.4 The Problem Definition

The previous sections have presented the graph matching problem and its variants. Section

1.3 has highlighted the various applications of graph matching. In this research the problem

of graph matching (graph isomorphism) is considered using undirected simple connected

graphs.

These types of graphs are very general and have lot of applications viz; computer vision,

pattern recognition, chemical structure analysis, identifying protein complexes etc, and

hence are chosen for experimentation in this work.

To put the problem in proper context it is proposed to develop techniques for finding

whether two graphs G1 and G2 (Synthetic or otherwise) are isomorphic and if they are

isomorphic the techniques will further provide a mapping / correspondence of vertices and

edges between the two graphs.

Given two undirected simple graphs G1=(V1,E1) and G2=(V2,E2), the problem being explored

requires to verify whether there exists a bijection Φ, such that

Such that any two vertices x and y of G1 are adjacent in G1 if and only if Φ(x) and Φ(y) are

adjacent in G2. If there is a bijection then the vertex correspondence between the vertices

of the graphs G1 and G2 is to be found, otherwise the methodologies should conclude that

the graphs are not similar/ isomorphic/ matching.

This and other related problems have been explored by many researchers and a few such

reported works are described in the following section.

1.5 Overview of the Solutions from Literature

One of the most fundamental problems in graph theory is verifying graph isomorphism or

graph matching. It has found many applications in various domains. The problems of

determining whether pair of graphs is isomorphic have been extensively studied in the

literature. The solution to graph matching problem has been approached in various ways. A

brief description of the graph matching including the theoretical aspects and a brief survey

of graph matching algorithms is presented in [Bunke, 2000]. A comprehensive survey of

graph matching algorithms applied to pattern recognition and image processing for the

thirty year period preceding 2004 is found in [Conte D, etal, 2004 and 2003].

The question of whether there is a polynomial time algorithm for deciding whether two

graphs are isomorphic, popularly known as the graph isomorphism problem, has been one

of the best known open problems in theoretical computer science for more than forty

years. Remarkably, the problem has first been studied by chemists in the 1950s. Even

though the problem is still open, researchers have obtained a number of substantial partial

results. These results rely on a variety of techniques from different branches of theoretical

computer science and discrete mathematics. The various categories of solutions to the

graph matching problem are briefly described in the following sub sections. The various

solutions are classified into the following categories;

General Algorithmic Approaches

Graph Spectral Approaches

Approaches based on Optimization Techniques

Approximate Solutions using Soft Computing and Natural Computation

1.5.1 General Algorithmic Approaches

An algorithm is a problem-solving method suitable for implementation as a computer

program. While designing algorithms a number of different approaches can be employed.

For small problems, it hardly matters which approach is used, as long as it is, one that solves

the problem correctly. However, there are many problems for which the only known

algorithm takes so long to compute the solution that they are practically useless. For

instance, the naïve/ trivial approach of computing all n!, possible permutations of

the n vertices to show that a pair of graphs G1 and G2 are not isomorphic or isomorphic is

impractical even for small inputs. This is because the time complexity of the methodology

is exponential [Cormen etal, 1990].

Polynomial-time algorithms, whose number of computational steps is always bounded by a

polynomial function of the size of the input, are often treated as algorithms of acceptable

time behavior. Thus, a polynomial-time algorithm is one that is actually useful in practice.

The class of all problems that have polynomial-time deterministic algorithms is denoted

by P [Cormen etal, 1990]. The class of problems that have polynomial time non

deterministic algorithms are classified as NP problems, with a special class within it,

referred to as NP-Complete. Indeed, the graph isomorphism problem is one of the very few

natural problems that is neither known to be in P nor known to be NP-complete. Hence

researchers are on the look out for finding efficient polynomial algorithms for solving the

graph matching/ graph similarity/ graph isomorphism problem. In the following a few

algorithms to solve the graph matching/ graph isomorphism problem for different types of

graphs are described.

Ashay and Tevet (2009) propose a polynomial time algorithm for solving the graph

isomorphism problem and claim that the problem is in P class. If graphs G1 and G2 are

isomorphic then they must have the same sign frequency vectors in lexicographic

order and the algorithm obtains identical canonical forms of their

sign matrices S1* and S2* in polynomial-time, thus finding an explicit isomorphism

function .

Vorgelegt (2006) discusses classical algorithms especially the different quantum

approaches that have been proposed and gives an overview of the results that are

known in this area

Raffaele Mosca (2002) proposes a graph invariant λ (G), which is a non-negative

integer and is non-zero whenever G contains particular induced odd cycles or,

equivalently, admits a particular minimum clique-partition. An efficient algorithm for

computing λ (G) is also outlined in the aforesaid paper.

Ling Chia and Wai Keong Kok (2002) describe a methodology to determine all

disconnected weakly k homogeneous graphs for k = 2, 3.

Pavol Hell and Jarik Nesetirl (2007) address the question of density of the

homomorphism order for trigraphs, further the gaps in the order are characterized

Xuding Zhu (1992) provides a definition of star chromatic number and also discusses

the star chromatic number from the perspective of graph homomorphism and of

graph products.

Wang and Williams (1991) define the threshold weight of a graph as a measure of

the amount by which the graph differs from being a threshold graph. The paper

proposes and proves a theorem that specifies the threshold weight of any triangle

free graph to be a heavy graph. This can be later used for proving isomorphism.

Abdulrahim and Misra (1998) present an algorithm to solve the graph isomorphism

problem for the purpose of object recognition. The algorithm consists of three

phases: preprocessing, link construction, and ambiguity resolution. The algorithm

works for all types of graphs except for a class of highly ambiguous graphs that

includes strongly regular graphs which are detected in polynomial time.

A new backtracking algorithm for testing a pair of digraphs for isomorphism is

presented in [Schmidt and Druffel, 1976]. The algorithm is not guaranteed to run in

polynomial time but performs efficiently for a large class of graphs

The algorithmic approaches presented here are for different classes of graphs and a few of

them run in polynomial time.

1.5.2 Graph Spectral Approaches

Spectral matching is a computationally efficient approach to the approximate solution of

pair wise matching problems that are NP-hard. Polynomial time algorithms for graphs with

different properties, continuous time quantum walks for exact and inexact graphs,

canonical forms, and automorphism are used to determine graph isomorphism. Inexact

spectral matching algorithms that embed large graphs on a low dimensional isometric space

spanned by a set of eigenvectors of the graphs is found in [Emms etal, 2009; Vorgelegt,

2006]. In the following a representative set of works based on spectra is brought out.

Spectral graph matching can be interpreted as a maximum likelihood estimate of the

assignment probabilities and that the graduated assignment algorithm can be cast as

a maximum a posteriori estimator. Based on this analysis a ranking scheme for

spectral matching is derived. Further a novel iterative probabilistic matching

algorithm that relaxes some of the implicit assumption is proposed in [Egozi etal,

2012 ]

Cour etal (2006) present a new spectral relaxation technique for approximate

solutions to matching problems that naturally incorporates one to one or one to

many constraints. The paper also describes a normalization procedure for existing

graph matching scoring functions.

Beezer, (1990) provides a recursive construction that allows us to construct trees

and graphs, which have the minimum possible number of distinct eigenvalues and is

related to the amount of symmetry that the graph possesses; the fewer the number

of eigenvalues, the greater the amount of symmetry, this property can be employed

for building graph matching systems

Laplacians of homogeneous graphs and generalized Laplacians whose eigenvalues

are associated with various equilibrium of forces in molecules is described in [Chung

and Sternberg, 1992]

The spectral matrix for the Laplacian can be used to construct symmetric

polynomials that are permutation invariants. The coefficients of these polynomials

can be used as graph features that can be encoded as vectors and further employed

for graph matching as described in [Wilson etal, 2005]

Qui and Hancock (2006), present a technique that exploits the properties of the

commute time for the purpose of graph matching

Nonnegative matrix factorization (NMF) is proposed to solve different data mining

problems including graph matching on undirected and directed graphs in [Ding etal,

Hogben, (2005) presents a survey on spectral graph theory and the inverse

eigenvalue problem of a graph and examines the connections between these

problems. The paper presents some new results on construction of a matrix of

minimum rank for a given graph having a special form such as 0-1 matrix or a

generalized Laplacian.

Zhu and Wilson (2005) analyses the adjacency matrix, combinatorial laplacians,

normalized laplacians and unsigned laplacians and present a study on use of

spectrum for heat kernel matrix and path length distribution matrix.

Zahidi, (2007) employed algebraic invariants including both eigenvalues and

eigenvectors to show that the permutation is related to the eigenvector of both

graphs when they posses non-degenerate eigenvalues. In the paper a new

technique that employs non degenerate eigenvalues and corresponding

eigenvectors for detecting isomorphic graphs is presented.

Damien etal, (2008) introduce a new metric, namely weighted spectral distribution

that improves on the spectrum by discounting the eigenvalues believed to be

unimportant and emphasizing those which are important, for comparing graphs.

The problem of graph matching can be posed as maximum likelihood estimation

using the apparatus of EM algorithm and to cast the recovery of correspondence

matches between the graph nodes in a matrix framework [ Luo and Hancock,2002].

Umeyama Shinji (1988) presents an approximate solution to weighted graph

matching problem. It employs an analytic approach based on the eigen-

decomposition of the adjacency matrix of a graph and almost always gives the

optimum matching when a pair of graphs is nearly isomorphic.

A new methodology was developed by He, etal (2003) for detecting graph

isomorphism using the concept of quadratic form. Graphs are represented first by

quadratic form, and the comparison of two graphs is thus reduced to the comparison

of two quadratic form expressions. If both the lengths and the directions of the semi

axes of quadric surfaces that are characterized by the eigenvalues and eigenvectors

are the same, the associated graphs are isomorphic. An algorithm is developed

based on this idea, and tested for the counter-examples known to other methods

Robles-Kelly and Hancock, (2002) describe a spectral method for graph matching;

the method makes use of a brushfire search procedure to find correspondence. The

node order of the steady state random walk associated with Markov chain is

determined by the coefficient order of the leading eigenvector of the adjacency

matrix.

Gori etal, (2004) propose a novel polynomial time approximation algorithm that uses

random walks to compute the topological features for each node to identify vertex

correspondence.

Most of the applications of graph matching make do with inexact graph matching. Hence

these applications need error tolerant graph matching. Many of the error tolerant matching

techniques make use of edit distances between graphs. In the following a few such

methodologies are summarized.

Three novel methods to compute the upper and lower points for the edit distance

between two graphs in polynomial time is proposed in [ Zhiping etal, 2009].

Robles-Kelly and Hancock, (2003) propose a graph spectral seriation method to

convert adjacency matrix to string or sequence order. The edit distance is computed

by finding the sequence of string edit operations which minimize the cost of the path

traversing the edit lattice, which is employed for inexact graph matching.

Influence of the cost function on the optimal match between two graphs, is studied

in [Bunke, 1999]. The paper also shows that graph isomorphism, sub graph

isomorphism and maximum common sub graph problems are special cases of

optimal error correcting graph matching under particular cost function.

1.5.3 Optimization

Researchers have modeled the graph matching as assignment problem or other

optimization problems and have proposed a few solutions for the same. A few of them are

briefly summarized in the following;

Gold and Rangarajan, (1996) propose a graduated assignment graph matching

algorithm which addresses many problems and works on, weighted or attributed

relational graphs with missing or extra link nodes and addresses problems such as

sub graph isomorphism, weighted graph matching and attributed relational graph

matching.

A linear program is obtained by formulating the weighted graph matching problem

in L1 norm and then transforming the resulting quadratic optimization problem to a

linear one. The simplex method augmented by 0-1 integer solution of the Hungarian

method is used to verify graph matching [Almohammad and Duffua, 1993]

Most of the methods in this genre work on weighted attributed graphs.

1.5.4 Approximate Solutions to Graph Matching using Soft Computing and

Natural Computation

Recently the advances in soft computing and natural computation paradigms have made

available tools which can gainfully be employed at least in finding approximate solution to

the graph matching problems. A few representative approaches are enumerated here.

Cross etal (1997) describes a framework for performing relational graph matching

using genetic search on attributed graphs. There are three novel ingredients to the

work. Firstly, the optimization process is cast into a Bayesian framework by

exploiting the recently reported global consistency measure of Wilson and Hancock

as a fitness measure. The second novel idea is to realize the crossover process at the

level of sub graphs, rather than employing string-based or random crossover. Finally,

convergence is accelerated by employing a deterministic hill-climbing process prior

to selection.

Jain and Wystozki, (2005), propose a neural network approach to solve exact and

inexact graph isomorphism problems for weighted graphs. The method is based on a

neural refinement procedure to reduce the search space followed by an energy

minimizing matching process.

Recently there is spurt of reported methodologies which make use of soft and

evolutionary computing paradigms. Most of the soft and natural computation

approaches work on attributed graphs. All of the literature reported in this section point

to the fact the problem of graph matching/ graph similarity is still an open problem. And

this has motivated this research work. The motivating factors that lead to the current

research work is brought out in section 1.6.

1.6 Motivation for Current Work

The study and exploration of works in graph theory, especially those for graph matching/

graph similarity motivated us to further delve into the details. A detailed look into the graph

matching problem and its variants evinced sufficient interest. The diversity of applications

which are bought out in section 1.3 urged us to take up work in this direction. Out of

various graph type’s simple undirected graphs have been found to have large number of

applications. Development of efficient algorithms for solving the graph matching problem

on simple undirected graphs will have an impact on various applications areas. Hence the

problem was selected. Further the problem has theoretical and algorithmic issues which

stimulate the mind. With these points, the problem of solutions to graph matching on

undirected simple graphs was selected and explored. The brief description of the new

techniques proposed in this research is given in section 1.7.

1.7 Overview of Proposed Solutions and Contributions

The work for development of efficient techniques started with using the concept of

distance from a vertex to every other vertex in a connected graph to represent the

complete characteristics of the graph. It is proposed to use the distances between vertices

of the graphs and other spectral characteristics for ascertaining the similarity of graphs.

Further spectral, assignment and natural computation techniques have been employed for

establishing the correspondences between the vertices and edges between similar/

isomorphic graphs. Four different approaches have been proposed for the purpose. The

techniques have been tested on a large variety of synthetic graphs and have been shown to

perform excellently.

The first technique uses spectral properties namely the principal eigenvector, degree of

vertices and average shortest distance from a vertex to all other vertices to check for

similarity and establishing correspondence between the vertices. This technique employs

the adjacency matrix representation of the graphs. The eigenvalues and the eigenvectors of

the two adjacency matrices are obtained. The vertices of the two graphs are ordered based

on the coefficients of the eigenvector corresponding to the largest eigenvalue. This order is

used for mapping of vertices and further their correspondence is verified by checking for

equality in vertex degree and average shortest distance to the other vertices, if the check is

positive the two graphs are similar/ isomorphic otherwise the two graphs do not match.

The methodology has been tested and is found to be accurate and robust.

A new theorem and a corollary that propose the invariance of degree and average shortest

distance are necessary and sufficient condition for isomorphism of path graphs is presented

and proved. The methodology for other simple undirected graphs is based on this

theorem/corollary. Further the vertex correspondence is proposed based on the rank order

of vertices obtained by the non increasing order of the coefficients of principal eigenvector.

The second technique also uses spectral properties but employs the normalized adjacency

matrix representation of the graphs. The similarity of the graphs is ascertained by the co-

spectral nature of the two graphs along with degree invariance and invariance of the sum of

shortest distances to other vertices. The vertex correspondence of the graphs is established

by the matching values of the least eigenvector (eigenvector corresponding to the least

eigenvalue). The coefficients of the eigenvector corresponding to the least eigenvalue are

used to map the vertices. If there are less than n/2 (empirical value) matches of the

coefficients, then one of the eigenvector is reversed in sign and mapping is carried out.

The third technique proposed in this work establishes the similarity between the graphs,

using the matching of vertex degrees and invariance of the sum of shortest distances to all

other vertices in the graph. Once the similarity is established, the vertex and edge

correspondence is obtained by solving an ingeniously set up assignment problem, which is

solved by the Hungarian approach [W6]. The assignment problem is set up based on the

theorem of vertex correspondence between two isomorphic graphs, proposed and proved

in this work. The methodology is found to be robust and has produced accurate results.

The fourth technique modifies the third technique by solving the correspondence problem

using natural computation technique namely the genetic algorithm. It again establishes

similarity between the graphs by making use of the matching in vertex degrees and the sum

of the shortest distances to all other vertices. Later the problem of vertex and edge

correspondence is solved by proposing new chromosome structure, crossover and mutation

operators in the natural computation process using simple and steady state genetic

algorithms. The technique employs an ingenious fitness function that is based on a corollary

to theorem of vertex correspondence and searches the fittest individual chromosome as a

solution to the correspondence problem. Both steady state GA and simple GA are

implemented using the same chromosome structure, reproduction operators (crossover

and mutation) and the fitness function. The work has produced accurate results.

In all the four techniques described above; the results are satisfactory and theoretical basis

has been established for their performance. These techniques add to the vast repertoire of

techniques to ascertain graph matching and are proved to be efficient. They have

applications in variety of domains. The proposed techniques are also found to be efficient

as they have polynomial time complexity and have produced accurate result on counter

examples for graph isomorphism checking. The next section describes the organization of

the thesis.

1.8 Organization of the Thesis

The thesis is organized into six chapters. The first chapter gives an introduction to graph

theory, its applications and puts in context the graph matching problem. The chapter also

describes the different variants of the graph matching problem and gives the state of the

art techniques for graph matching. The problem definition and proposed techniques are

also briefly presented. Chapter 2 describes the spectral and average shortest distance based

approach to graph matching. The methodology is described in detail and thorough

experimentation is reported. The theoretical basis for the approach is also provided, by

presenting theorem, corollary and their proofs.

Chapter three introduces another spectral technique using normalized adjacency matrix

representation and sum of shortest distances to other vertices from a vertex along with

vertex degree as invariants for graph matching. The detailed algorithm is described,

theoretical basis is provided and results are enunciated. The chapter four describes the

assignment technique for establishing correspondence between the vertices of the two

isomorphic graphs along with a theorem for vertex correspondence of isomorphic graphs.

The results are brought out in detail. The chapter 5 describes the genetic algorithm

technique for graph matching. Chapter 6 gives a comparative study of the four techniques

proposed, presents the time complexities and summarizes the research work.

1.9 Summary

The chapter has given a brief introduction to graph theory and its applications. The graph

matching problem was highlighted and its different variants are introduced. The

applications of graph matching in various domains are presented. The problem definition

and the motivation for the work are enunciated. Further the various proposed solutions and

organization of the thesis are also described.

Chapter 1 Graph Matching: An Introduction -...

Documents