2011 04 troussov_graph_basedmethods-weakknowledge

© 2011 Alexander Troussov

Graph-based methods to exploit “weak” knowledge

Alexander Troussov, Ph.D., IBM Dublin Software Lab

16th of April 2011, Mathlingvo Seminar, St.Petersburg State University, Russia

© 2011 Alexander Troussov2

About AT

� IBM Ireland Center for Advanced Studies - Chief Scientist

� IBM LanguageWare group – the Architect

� National Geophysical Data Center, Boulder, CO, USA - Visiting scientist– Fuzzy logic based search engine for search in large databases when exact parameters

of search are hard to define

� Observatoire de la Côte d’Azur, Nice, France – Visiting scientist– numerical simulation in stochastic physics

� Institute of Physics of the Earth (Russian Academy of Sciences) and the International Institute for Earthquake Prediction Theory and Mathematical Geophysics, Moscow, Russia -Lead Researcher

– R&D in geophysics and geoinformatics

� System programming at the Institute of Precise Mechanics, Moscow

� PhD in Mathematics from Lomonosov Moscow State University


Natural Language Understanding is Inferencing (?)

� From computational point of view natural language understanding is inferencing

– Text which mentions Malahide

is probably about Canada (??)

Malahide (Canada 2006 Census population 8,828) is a township in Elgin County, Ontario, Canada

Source: Troussov et al. MITACS, Canada, 2010


Inferencing

� Terms are ambiguous, and our knowledge is never “the truth, the whole truth, and nothing but the truth”

– Malahide, Co. Dublin– Malahide is a township in Elgin County, Ontario, Canada.– Paradis Gisenyi Malahide is a hotel in Rwanda

� Solution (Troussov et al. MITACS, Canada, 2010 ): propagation from multiple concepts, for instance, the initial seed for the activation propagation starts at two nodes in a geographical taxonomy: Malahide (Ontario) and Malahide (Co. Dublin) as well as from other concepts mentioned in the text

• Text which mentions Malahide and Europe – is a little bit more likely to be about Ireland than about Canada

• Text which mentions Malahide and Clontarf – is more likely to be about Ireland than about Canada

• …• Cohesive coherent text which mentions: Malahide, Mulhuddart, Lansdowne,

Clontarf, Donabate - is almost for sure about Dublin


Knowledge, Lexico-Semantic Resource

Text Relevancy


Text – Semantic Network

NETWORK OF CONCEPTS

TEXT

Mention Mention Mention Mention

Mapping of term mentions to concepts .

Finding “focus” concept


NLU as inferencing

The concept of a car is relevant to a text. Car IS-A “on-land travel” (?) Therefore “on-land travel” is somewhat relevant to t he text, …



NETWORK OF CONCEPTS

TEXT





Demo

– 2 1 Spreading Activation.pdf


Agenda

� Introduction

� Building Semantic Model

� SA

� Research Challenges– Why SA– Relayability of inferencing– What is the purpose of graph operations

� Centrality, network flow methods

� Zoo of algorithms

� Nepomuk Recommender



NETWORK OF CONCEPTS

TEXT





Spreading Activation Methods


� There is an increased need for a new generic and formal understanding of spreading activation as a class of algorithms rather than a particular algorithm with many parameters

� Spreading activation (also known as spread of activation) is a method for searching associative networks, neural networks or semantic networks. The method is based on the idea of quickly spreading an associative relevancy measure over the network. Our goal is to give an expanded introduction to the method. We will demonstrate and describe in sufficient detail that this method can be applied to very diverse problems and applications. We present the method as a general framework. First we will present this method as a very general class of algorithms on large (or very large) so-called multidimensional networks which will serve a mathematical model.

13

Source: Troussov, Levner, Bogdan, Judge, Botvich “Spreading activation methods”


� We present spreading activation in a generic form, as a set of methods suitable for mining multidimensional networks with oriented weighted links. These graphmining methods might produce results similar to those which might be achieved by soft clustering and fuzzy inferencing. The input object is a function on nodes of the network, and the spread of activation is a technique which provides “spreading” of this function through the network links. The result of the spreading activation is a new function on the nodes. The properties of that function strongly depend on the original function and the parameters of the spreading activation. For instance, when the underlying network is a network of ontological concepts, parameters governing spread might be chosen in such a way that allows “smoothing” of the original function and interpreting the resulting function as “conceptual” summaries of the initial non-zero valued nodes.

14


Origin of Spreading Activation Methods

� In neurophysiology interactions between neurons is modeled by way of activation which propagates from one neuron to another via connections called synapses to transmit information using chemical signals. The first spreading activation models were used in cognitive psychology to model this processes of memory retrieval (Collins, A.M. & Loftus, E.F., 1975; Anderson, J.,1983).

� This framework was later exploited in Artificial Intelligence (AI) as a processing framework for semantic networks and ontologies, and applied to Information Retrieval (Crestani, F., 1997; Aleman-Meza, Halaschek, Arpinar, & Sheth, 2003; Rocha, C, Schwabe, D. & Poggi de Aragao, M., 2004; …) as the result of direct transfer of information retrieval ideas from cognitive sciences to AI.

15


Notation

� A multidimensional network can be modeled as a directed graph, which is a pair

� G = (V,E)

� where

� V – is the set of vertices vi

� E – is the set of edges ej (although in oriented graphs edges are referred to as arcs)

� init: E → V – is the mapping which provides initial nodes for arcs

� term: E → V – is the mapping which provides terminal nodes for arcs

� imp – is importance value of arcs and nodes.

� For instance, imp(v) where the node v is a geographical location, might be the population. Imp(e)number of phone calls from person init(e) to person term(e).

� w – “weights”, for instance, the sigmoidal function of imp.

� w(ej)=0 means that effectively arc ej is ignoredw(ej)=1 means that activation of init(ej) strongly affects the activation of term(ej). For instance, when the nodes represent “words”, synonym links might be assigned the value 1.

� F(E) – is the “activation” function, usually a real valued function on nodes of the network.

16


Generic description of spreading activation methods (SAM) framework

� 1. InitialisationSets the parameters of the algorithm, network, and initial F(E) as a list of non-zero

valued nodes V n

� 2. Iterations(each iteration is one pulse of SAM)

– a. List Expansionthe list is expanded to include neighbors (including both neighbors following outgoing

links, and neighbors which have links to the nodes in the list). Newly added nodes receive a zero valued level of activation

– b. Recomputationthe value at each node in the list is recomputed based on the values of the function on

nodes which have links to the given node and types of connections– c. List Purging

The list is purged - we exclude the nodes with the values less than a threshold.– d. Conditions Check To Break Iterations

like maximum number of iterations to be performed.

� 3. OutputThe list of nodes (value of the function after spread of activation) ranked according F

values.

17


Generic description of recomputation phase

� We have the list of nodes V n .

� 1. Input/Output Through Links Computation.– For each node v we compute the input signal to each arc e, such that init(e)=v. When the

signal (“activation”) passes through a link e, the activation usually experiences decay by a factor w(e)

� 2. Input/Output of Node Activation– Before the pulse, the node v has the activation level F(v).

• Through incoming links v get more activation, By dissipating the activation through outgoing links, the node v might lose activation.

� 3. Computation of the New Level of Activation– A new value F(v) is computed based on F(v), Input (v), and Output (v)

18



� 1. Input/Output Through Links Computation.

� For each node v we compute the input signal to each arc e, such that init(e)=v. This computation can be based on the value F(v), the outdegree of a node etc. For instance, if the node v has n outgoing arcs of the same type, each arc e might get input signal:

� I (e) = F(init(e)) · (1 / outdegree(v)**beta )

� where beta might be equal to 1. It could be also less than one, in which case the node v will propagate more activation to its neighbors than it has.

� When the signal (“activation”) passes through a link e, the activation usually experiences decay by a factor w(e):

� O (e) = I(e) · w(e)

19


Generic description of input/output phase

� 2. Input/Output of Node Activation

� Before the pulse, the node v has the activation level F(v).

�

� Through incoming links v get more activation:

� Input(v) = Σ O(e)

� for all links e such that init(e) ∈V n, term(e) = v.

�

� By dissipating the activation through outgoing links, the node v might lose activation:

� Output(v) = Σ I(e)

� for all links e such that init(e) = v, term(e) ∈V n

20



� 3. Computation of the New Level of Activation

� A new value F(v) is computed based on F(v), Input (v), and Output (v), for example

� Fnew(v) = F(v) + Input (v)

21


SAM and Methods of Numerical Simulation in Physics

� Spreading activation algorithms were introduced in 1990s; however the same iterative methods were used long before in numerical simulation in physics, mechanics, chemistry and engineering sciences. The major distinctions of these algorithms from what is called now as spreading activation are:

– a) in physics – such algorithms usually work on a regular mesh (so that the local topology of the graph is encoded into formulas of the recomputation stage)

– b) in physics – initial conditions, or initial activation – are usually assigned to all nodes on the mesh; and the use of algorithms for efficient graph traversal is not needed. For instance, steps 2a (List expansion) and 2b (List Purging) in the generic description of SAM framework might be skipped.

� For instance, one dimensional heat transfer equations might be numerically simulated on a one-dimensional mesh, by iterative methods. On each iteration recomputation stage is based on the formula below:

� Fnew (v) = ( F(RightNeighbor(v)) + F(LeftNeighbor(v)) ) / 2

� Using a different formula, one can simulate the behavior of an oscillating string (although this will require storing tree values at each node - position, mass and velocity of the material point corresponding to the node).


SAM and Methods of Numerical Simulation in Physics

� Using the same iterative algorithm, with one set of parameters one can emulate heat transfer; with another set of parameters the same algorithm will show us the behavior of oscillating strings. But the phenomena of heat propagation and string oscillation are quite different (for instance, heat propagation might lead to “thermal death” - the state of equilibrium where the level of activation is the same for all nodes, while oscillation might continue forever). Our illustration concern only basics, while real modeling might be much more complicated, for instance, hear transfer might lead to combustion, where after reaching some level of activation a node generates more “heat” than it gets from neighboring nodes.

23



Spreading Activation as a Graphmining Technique

� The technique of SAM is quite polymorphic. On this slide we interpret the results of spreading activation in terms of graph mining.

– First of all, one can think that after running SAM the most activated nodes will be those nodes, which get the activation from multiple sources, or, in other words, those nodes which minimize the “distance” to the nodes which were initially activated. Therefore these nodes might be considered as potential centroids of strong clusters induced by the initial activation. Since partitioning of the nodes according to these clusters is not immediately available (and is not needed in many applications), SAM algorithms might be considered as methods of soft clustering.

– On the other hand, the most activated nodes are those nodes, which are connected to the initial conditions by particular types of directed links (arcs with large weights). Therefore we might consider SAM as an efficient scheme for computing fuzzy inferencing. For such applications replacing a single valued function F by a vector function might be useful.

� We conclude by noting that SAM algorithms might be used for soft clustering and fuzzy inferencing on networks.

25


Ναπολέων

Γαλλία

Παρίσι

Αλέξανδρος

People

Geographical artifacts

Relations• Friends• Part of, Instance of, Subcluss• Created


Napoleon Alexander

Kutuzov

Project:Invasion of Russia

Meeting: Battle of Austerlitz

Meeting: Battle of Borodino

Russia

Moscow

Borodino

France

Paris


Diagram on the previous slide …

� What it represents?

� How it can be used?


Napoleon Alexander

Kutuzov

Project:Invasion of Russia

Meeting: Battle of Austerlitz

Meeting: Battle of Borodino

Russia

Moscow

Borodino

France

Paris

How this diagram could be used? 1.Network flow process could show the nodes most relevant to the pair “Napoleon” & “Meeting”- Selection WHO – whom to invite- Other nodes – explain recommendations2.When Napoleon opens email or a web page containing W&P he will be advised that the content of this resource is relevantto his project “Invasion of Russia”0


Diagram on the previous slide … What it represents?

� Data from Facebook, data from Napoleon’s Lotus Notes calendar, structure of a Wiki, network of collocations or relations between the entities in W&P, …

– The proliferation of Web 2.0 and Enterprise 2.0 technologies has lead to the emergence of massive networks connecting people and various digital artifacts. These networks can be treated as a “weak” knowledge, which nevertheless might be used recommendations and even for such traditional applications as knowledge-based text processing

� Or instantiation of an ontology related to W&P by Leo Tolstoy– In which case we would probably know that Napoleon is emperor of France, Paris is the

capital (not instantiation of a subclass) of France, etc.

� Ontology provides conceptualization, allow inferencing, but these advantages per se are useless without tedious manual work to encode the rules how to use this additional knowledge. While the knowledge encoded in the topology of the multidimensional network is ready to use provided that methods are tolerant to errors and inconsistencies in data - i.e. the methods are methods of “soft mathematic” – fuzzy inferencing, soft clustering, …


A New Mathematical Model of Horse Racing

� Assume, without the loss of generality, that each horse in the horse racing is modelled by a wooden ball of radius Ri.

Social Context = Knowledge ?

= a ball ? ☺☺☺☺


Representing social context as a knowledge allows us to benefit from the experience of knowledge based applications.


For instance, the social context modeled as a network is not much different from semantic networks which are formed from concepts represented in ontologies. And it is possible to use such networks for knowledge based text processing. Representing social context as knowledge allows us to draw experience from such mature R&D area as knowledge-based text processing


How to model the social context

� As multidimensional networks– The primary source - network models of instantiations of techno-social systems

� As a “Knowledge” – represented as objects, clauses, XML, graphs, some combination of these

34


Log-files of Techno-Social systems (like Facebook or IBM’s Lotus Connections) keep track about who did what.Triples could be aggregated into a network.

The primary source – network models of techno-social systems

Created

Invited

Joined


Examples of Graph Models:Folksonomies: – Tripartite Hypergraph

� Social bookmarking systems (Del.icio.us, …) – Where to keep my bookmarks?– Users (actors), resources, tags

� In social bookmarking systems users describe bookmarks by keywords called tags. The structure behind these social systems, called folksonomies, can be viewed as a tripartite hypergraph of actors, tag and resource nodes.

– Three types of citizens of the first class citizens, and hyperplanes– If hyperplanes are made from rubber, they could be schinked to a node, so the

hyperplanes will also be citizens of the first class

� Advantages of the network models (see next slide)– Extensibility– Easy of merge heterogeneous information

Source: Hypergraphs: see Jäschke et al. "Logsonomy — A Search Engine Folksonomy" MediaICWSM 2008AAAI Press (2008)


Inferencing – “Soft methods” could provide reliable inferencing

For instance, the social context modeled as a network is not much different from semantic networks which are formed from concepts represented in ontologies. And it is possible to use such networks for knowledge based text processing. Representing social context as knowledge allows us to draw experience from such mature R&D area as knowledge-based text processing


Natural Language Understanding is Inferencing (?)

� From computational point of view natural language understanding is inferencing

– Text which mentions Malahide

is probably about Canada (??)

Malahide (Canada 2006 Census population 8,828) is a township in Elgin County, Ontario, Canada

Source: Troussov et al. MITACS, Canada, 2010


Inferencing

� Terms are ambiguous, and our knowledge is never “the truth, the whole truth, and nothing but the truth”

– Malahide, Co. Dublin– Malahide is a township in Elgin County, Ontario, Canada.– Paradis Gisenyi Malahide is a hotel in Rwanda

� Solution (Troussov et al. MITACS, Canada, 2010 ): propagation from multiple concepts, for instance, the initial seed for the activation propagation starts at two nodes in a geographical taxonomy: Malahide (Ontario) and Malahide (Co. Dublin) as well as from other concepts mentioned in the text

• Text which mentions Malahide and Europe – is a little bit more likely to be about Ireland than about Canada

• Text which mentions Malahide and Clontarf – is more likely to be about Ireland than about Canada

• …• Cohesive coherent text which mentions: Malahide, Mulhuddart, Lansdowne,

Clontarf, Donabate - is almost for sure about Dublin

� Such rapid “phase transition” from uncertainty to certain ty is similar to the transition related to percolation threshold


from Uncertainty to Certainty in Inferencing: phase transitions as a function of seed size in analogy to ones in percolation

� In (semantic) networks with high local density the reliability of inferencing from a single concept is almost never sufficient, reliability could be low when inferencing starts from a small number of seed concepts, but inferencing becomes very reliable at some level of the number of the initial seed concepts (which could be explained by combinatorics)

Number of nodes in the seed

Reliability of inferencing


And could be explained by combinatorics

� A graph showing the approximate probability of at least two people sharing a birthday amongst a certain number of people.

� In probability theory, the birthday problem, or birthday paradox, pertains to the probability that in a set of randomly chosen people some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 366 (ignoring February 29 births). But perhaps counter-intuitively, 99% probability is reached with a mere 57 people, and 50% probability with 23 people.


Simulation

Source: F. Darena and A. Troussov 2010

� The network (such as a taxonomy of geographical locations) is the tree of 20,000 nodes. Text is modeled as a list of 100 terms each of which is ambiguous and could be mapped into 8 network nodes. When such mapping happens, we consider that the node (the geographical location represented by the node) could be relevant to the text.

� We are looking for clusters such as the groups of Nnodes each of them is mentioned in the text and the graph distance between each pair of nodes in the cluster is less than three.

� Such graph structures have low probability of occurrence for small N (N=1 or 2), and their probability sharply decreases to zero for bigger N; correspondingly, our certainty that the graph structure signifies the topicality of the text increases to 1.0

– Text which mentions Malahide, Mulhuddart, Lansdowne, Clontarf, Donabate - is almost for sure about Dublin (Ireland)


Processes in Networks

� How we study the Earth?– By looking at the results of the propagation of

waves through the EarthPropagation of seismic wave in the ground and the effect of presence of land mine

� Similarly, one can study the networks by network flow methods

– introducing the processes where something is flowing from node to node across the edges


Processses

� Used goods- trail

� Money - walk

� Gossip - replication rather than transference (trails rather than walks)

� E-mail - diffusion by replication

� Attitudes - spread through replication rather than transfer

� Infection - spreads like gossip, but does not re-infect

� Packages - usually the shortest route possible

� Relevancy in semantic networks

� Trust - Shortest path or volume?

44



we are talking about consumability of centrality measurements produced by network flow methods like these (DEMO)

46


Key difference between SNA and other approaches to social science

� Social sciences usually have focus on attributes of individual actors


Key difference between SNA and other approaches to social science

� SNA focus on relationships between actors

� “Social network analysis reflects a shift from the individualism common in the social sciences towards a structural analysis”. Garton et al. Studying Online Social Networks

� Structuralism is an approach to the human sciences that attempts to analyze a specific field (for instance, mythology) as a complex system of interrelated parts. лингвистс Романа Якобсона и Ник. Трубецкоjантрополог Леви-Стросс~ Complex systems

� Sociogram:– Jacob Levy Moreno (1889-1974) was a Austrian-American

leading psychiatrist and psychosociologist, thinker and educator, the founder of psychodrama, and the foremost pioneer of group psychotherapy. Among Moreno’s primary contributions to sociometrics was the sociogram. The sociogram is a method of representing individuals as points on graphs and using lines and arcs to represent the relationships between the individuals.

� Graphics from Prof. Hendrik Speck's tutorial at 5 th Karlsruhe Symposium for Knowledge Management in Theory and Praxis, 2007


Prominence

� The study of structural properties of networks and their interplay with the processes taking place on the network is one of the main problems in the last years in the field of complex network analysis

� A primary use of graph theory in social network ana lysis is to identify “important”actors.Centrality and prestige concepts seek to quantify graph theoretic ideas about an individual actor’s prominence within a network by summarizing structural relations among the graph nodes.

� An actor’s prominence reflects its greater visibility to the other network actors (an audience). An actor’s prominent location takes account of the direct sociometric choices made and choices received (outdegrees and indegrees), as well as the indirect ties with other actors. The two basic prominence classes:

– Centrality : Actor has high involvement in many relations, regardless of send/receive directionality (volume of activity)

– Prestige : Actor receives many directed ties, but initiates few relations(popularity > extensivity)

Source: Wasserman&Faust "Social Network Analysis“ (W&F)


Centrality: Eigenvector Centrality

� Eigenvector centrality was introduced by Phillip Bonacich in 1987

� “Google's workhorse search engine ranking algorithm, PageRank, is actually a variant on an SNA concept - Bonacich Power Centrality.

– Bonacich (1987) hypothesized that someone's power in society depends on the power of his or her social contacts. Bonacich formalized this mathematically:

ci = B(c1Ri1 + c2Ri2 + ... + cnRin) , where ci is the person in question, B is the magnitude of the effect, and Rij is the strength of the relationship between the person in question, i, and each of the other people, j, under consideration. If B=1 , the formula becomes eigenvector centrality, of which PageRank is a variant. Now, Page, et al. (1998) do not cite Bonacich, I am not claiming that they stole the idea - I am merely stating that a social network analyst appears to me to have been the first to think up the concept”.

Solomon Messing http://www.stanford.edu/~messing/RforSNA.html


Centrality and the network flow methods

� Most of the centrality measurement are based on the network flow process, “that focuses on the outcomes for nodes in a network where something is flowing from node to node across the edges” (Borgatti and Everett, M. 2006 ]

� We interpret this “something” as a relevancy measure; for instance, the initial seed input value which shows nodes of interest in the network. Propagating the relevancy measure through outgoing links allows us to compute the relevancy measure for other network nodes and dynamically rank these nodes according to the relevancy measures.

� The same paradigm could be used to address the centrality measurements in social network analysis. Centralisation of the network can be achieved when we assume that all the nodes are equally important, and iteratively recompute the relevancy measure based on the connections between nodes.


Master Equation Numerical Solution

� Bonacich Power Centrality, Eigenvector Centrality, Google’s PageRank

– “Google's workhorse search engine ranking algorithm, PageRank, is actually a variant on an SNA concept - Bonacich Power Centrality. Bonacich (1987) hypothesized that someone's power in society depends on the power of his or her social contacts. Bonacich formalized this mathematically:

ci = B(c1Ri1 + c2Ri2 + ... + cnRin) , where ci is the person in question, B is the magnitude of the effect, and Rij is the strength of the relationship between the person in question, i, and each of the other people, j, under consideration. If B=1 , the formula becomes eigenvector centrality, of which PageRank is a variant. Now, Page, et al. (1998) do not cite Bonacich, I am not claiming that they stole the idea -I am merely stating that a social network analyst appears to me to have been the first to think up the concept”.

Solomon Messing http://www.stanford.edu/~messing/RforSNA.html


Master Equation Numerical Solution

Computation

Master equation easily leads us to a numerical solution


It is great to have “the right master equation”!What is the shape of a hanging chain?

– What is the shape of a hanging chain when supported at its ends and acted on only by its own weight?

Plotting geometric arrangements and forces acting on small segments of the chainIntegrating the results



What is the shape of a hanging chain when supported at its ends and acted on only by its own weight?

• Galileo: “This chain will assume the form of a parabola”

y = x 2Plotting geometric arrangements and forces acting on small segments of the chainIntegrating the results



What is the shape of a hanging chain when supported at its ends and acted on only by its own weight?

• Galileo: “This chain will assume the form of a parabola”

y = x 2

• But the shape is different: y = (a / 2) ( ex/a + e-x/a )

which was established later by applying calculus

Plotting geometric arrangements and forces acting on small segments of the chainIntegrating the results

Leibniz's solution is on the left.Huygen's illustation is on the right.

." In 1669, Jungius disproved Galileo's claim that the curve of a chain hanging under gravity would be a parabola (MacTutor Archive). The curve is also called the alysoid and chainette. The equation was obtained by Leibniz, Huygens, and Johann Bernoulli in 1691 in response to a challenge by Jakob Bernoulli”.

http://mathworld.wolfram.com/Catenary.html


� “Plotting geometric arrangements and forces acting on small segments” evolved into– Finite difference method

• In mathematics, finite-difference methods are numerical methods for approximating the solutions to differential equations using finite difference equations to approximate derivatives.

– Stencil• In mathematics, especially the areas of numerical analysis concentrating on the

numerical solution of partial differential equations, a stencil is a geometric arrangement of a nodal group that relate to the point of interest by using a numerical approximation routine. Stencils are the basis for many algorithms to numerically solve partial differential equations.

57


Numerical Solution NO Master Equation

� “Integrating” evolved into …– Well, in financial mathematics solutions are tuned on “stencils”.

Numerical solutions are known. Master equation is not known, and is not interesting to know.

� “Master equation is not known” – this is ok. – But we need to be aware about emergency effects in complex systems:

learning how to do something right in a small scale, doesn’t necessarily imply that we’ll do right things in a bigger scale

58


� Leibniz, Huygens, and Johann Bernoulli knew geometry and mechanics. We don't know "geometry" and "mechanics” of techno-social systems (and we don’t even know "geometry" and "mechanics” of semantic network, social networks, …)

� but we can create small "nodal arrangements" modeling multidimensional networks (for instance, folksonomies)

� Apply known and novel numerical algorithms and utilize state of the art knowledge to decide which algorithms provides better results.

� The next step - to check if good properties of the numerical solutions on the micro-level hold true on the mezzo-level

59

Source: Troussov at MITACS Workshop in Vancouver, Canada, 2010


Recommender systems and global/local ranking

� Link analysis is frequently employed for ranking and navigation

� Graph-based recommender systems should recommend “Important” objects (nodes, links, subgraphs)

which are also are – Close enough to the initial points of interests (query, focus, initial seed)

(for instance, in physical space)

� Global ranking ~ PageRank

� Breadth first search (BFS) ? Local Ranking !?

Recommending a suitable restaurant near the NY 9th avenue (next slide)

or the music you might like, the advertisement you should see, etc

60


Graphics: http://strangemaps.wordpress.com/2007/02/07/72-the-world-as-seen-from-new-yorks-9th-avenue/


Global Ranking (like Google’s PageRank) –a view on the network from external point - modern, “Copernican” approach

Source: NOAA


Local Ranking – is needed for recommenders – should rely on Ego-centered Ptolemaic view (actually, Poly-Centered, see next slide)

Graphics: http://strangemaps.wordpress.com/2007/02/07/72-the-world-as-seen-from-new-yorks-9th-avenue/

LOCAL RANKINGEgo-centered or "personal“ networks provide an Ptolemaic views of their networks from the perspective of the persons (egos) at the centers of their network.


Poly-CentricPOLY-CENTRICIn physical space – navigation is from one point to another. In applications to virtual spaces - navigation is not simply browsing from a single object to another, but by dealing with several objects at the same time .For instance, to get better results in Google we add terms, we remove terms, …To compute recommendation “Whom invite to the meeting”, one can start navigation from two objects representing the user whom recommendation is for and the meeting in question


.

� Graph-based recommender systems should recommend

“Important” objects (nodes)which are also located

Close to the initial points of interests (query, initial seed)

� One of the leading approaches in recommenders is:Results of Global Ranking (Link analysis)

are “filtered” according to their proximity to the query

� In this paper we introduce novel algorithms which could replace two step procedure mentioned above with one step:

Local Rankingwhich simultaneously computes proximity and importance


Web and Communities

� Communities in Social Sciences: A tribe learning to survive, a group of engineers working on similar problems, …

� Communities in computer sciences - any empirically found group of people

Recent advances in digital technologies invite consideration of organizing as a process that is accomplished by global, flexible, adaptive, and ad hoc networks that can be created, maintained, dissolved, and reconstituted with remarkable alacrity”.

Prof. N. Contractor


Community detection … but What is a Community?

� Are you Russian? Yes. Are you Irish? Yes. Are you mathematician? Yes. Are you practitioner? Yes.

– Communities easily overlap, multiple membership and fuzzy belongings

� At the same time, some communities SHOULD be kept separate– Remember “Strange Case of Dr Jekyll and Mr Hyde ” (Robert Louis Stevenson, 1886).

• How Google had failed to understand an essential property of real-world social networks

• So by testing their social service inside a single context (Google employees only), the developers failed to notice that in real life, people participate in multiple contexts (family, work, friends, etc) that they wor k actively to keep separate. The reasons for wanting to keep these groups separate can range from wanting to keep an illicit affair secret from your spouse to political activists in oppressive regimes wanting to keep certain connections secret from the government. Another important reason to keep our communities separate, is that we often play different roles - and communicate differently

http://www.iq.harvard.edu/blog/netgov/2010/03/worlds_colliding.html


New methods for community detection are needed

� Multiple membership – Are you Russian? Yes. Are you Irish? Yes. Are you mathematician? Yes. Are you

practitioner? Yes. …

� Fuzzy-belongings – We don’t know the social structures behind on-line “communities”

members of an on-line community don’t necessarily have the sense of identity as members of real-life social communities, on-line communities could be project teams or networks of knowledge, …

� High performance and scalability (agglomerative, local, …)– Clustering as simply partitioning is ruled out because of multimembership– Clustering as partitioning is not possible in real time for many business applications

• IBM Intranet: 400K employee, 10K on-line communities (the biggest 23K members), ...

� Contextualisation of Community Detection– Collaborative filtering systems provide recommendations based on the detection of like-

minded users. But the user of a techno-social system whom the prediction is for could be "Matematician", "Irish" etc., or a kind of Dr. Jeckyll / Mr. Hyde persons, etc.(see next slide)


An example of clustering around a node using propagation

69


�


Future work in local dynamic clustering

� Troussov et al “Vectorised Spreading Activation” 2010 theorize that the future development of spreading activation (SA) methods might be driven by

“physics-inspired”and

“logic-inspired” algorithms– SA algorithms have roots in numerical simulation of various physics phenomena,

particularly by finite difference methods. – From the other hand, the iterative procedure of SA is essentially the same as the

procedure that determines the new state of a cell in cellular automata such as Conway’s Game of Life. Although cellular automata usually perform on rectangular (cubic, etc.) grids, the extension to arbitrary networks is feasible.~ Marker propagation, MajorClust, Chinese whispers graph clustering algorithm, …


Conway's Game of Life






Logic-inspired VSA

� Finite difference approximations to differential equations were one of precursors of cellular automata (Stephen Wolfram "A New Kind of Science") and of the method of spreading activation (Troussov et al 2009)

� Iterative computational procedures in cellular automata are the same as in SA.

� The identity of the computational procedures allows to develop VSA algorithms with hybrid operations over the components of the activation vector.

– For instance, “physical” operations could be responsible for the propagation of the activation around the initial seeds, the level of the activation indicates the relevancy of the nodes to the initial seeds.

– “Logical” operations could propagate markers, which indicate potential belongings of nodes to clusters.

� Such hybrid operations will combine ranking with cluster ing ; and is computationally efficient on massive networks since the major time consuming operations –retrieval of nodes – serve both “physical” and “logical” operations. The clustering does not involve partitioning of the whole network.


VSA & Marker propagation – combining ranking with clustering

An Expert

A topic I’m interested in

My University


VSA & Clustering (Cont.)






An Expert

My University

A topic I’m interested in


Tasks / Methods

� Various terminology in various domains (for instance, from the point of view of IM many tasks falls into the category of hidden knowledge discovery)

Multidimensional network point of view (A.T.):

Techno-Social Systems tasks

Networks Theory and Graph Theory terminology

CentralisationRecommender SystemsPageRank etcExpertise location

Random walksEigenvector centrality

Local topologyRecommender systemsLink prediction

Motifs

Ad hoc generalisation across dimensions

Expertise locationRecommender Systems

Clustering


Tasks

� Avenues to deep socio-semantic analytics and the po ssibility of high-quality functionalities for techno-social systems (like recommending people to invite into your social network) hinge on the availability of engines which are able

– to provide hidden knowledge discovery like• Structural importance of nodes • discovering a new relation in a network

that based on the strength of multiple connectivity between the nodes of a social network one can conclude that Dr. Jekyll is related to Mr. Hide ),

• provide ad hoc generalisation across dimensions . • For instance, the ability to detect that a particular person might serve as an

representative of a community or as an expert on a particular topic (the example of such generalisation is the expression frequently attributed to Louis XIV "L'e'tats'est moi (I'm the State).")


“Three steps away” � ?

John B. Tim B.Dan B.Axel P.

Why recommender decided that this three steps away connection is a strong connection?


John and Tim –

InterestWorkplace

Friends-of-Friends

Recommender computes that this is a strong connection because of multiple ways of connections

Shortest Path vs. Volume of traffic


John and Derek

Recommender computes that such type of connectivity is a weak connection


Tasks: Generalisation Across Domains - Whom is Claudia connected with?

Claudia

Martin

Elaine

John

Hanna

Dirk

Researcher

All of these people


Ranking

1

2

3


Ranking

1 2

3


Ranking

1 2

…

…


Nepomuk Recommender

� NEPOMUK (Networked Environment for Personalized, Ontology-based Management of Unified Knowledge) is an open-source software specification that is concerned with the development of a social semantic desktop that enriches and interconnects data from different desktop applications using semantic metadata stored as RDF.

� Initially, it was developed in the EU 6th framework integrated project Nepomuk (2006-2008) - 17 million Euros, of which 11.5 million was funded by the European Union


Nepomuk Recommender (Cont.)

� Troussov et al “Social Context as Machine Processable Knowledge” presented the architecture of the hybrid recommender system in the activity centric environment Nepomuk-Simple (EU 6th Framework Project NEPOMUK).

� “Real” desktops usually have piles of things on them where the users (consciously or unconsciously) grouped together items which are related to each other or to a task. The so called “Pile” UI, used in the Nepomuk-Simple imitates this type of data and metadata organisation which helps to avoid premature categorisation and reduces the retention of useless documents.

� Metadata describing the user data are stored in the Nepomuk personal information management ontology (PIMO). Proper recommendations, such as recommendation of additional items to add to the pile, apparently should be based on the PIMO, on the textual content of the items in the pile. Although methods of natural language processing for information retrieval could be useful, the most important type of textual processing are those which allows to related concepts in PIMO to the processed texts. Since PIMO changes over the time, this type of natural language processing can’t be performed as preprocessing of all textual context related to the user. Hybrid recommendation needs on-the fly textual processing with the ability to aggregate the current instantiation of PIMO with the results of textual processing.


Nepomuk

� Representing and modeling this ontology as a multidimensional network allows to augment the ontology on the fly by new information, such as the “semantic” content of the textual information in user documents. Recommendations in the Nepomuk-Simple are computed on the fly by graph-based methods performing in the unified multidimensional network of concepts from the personal information management ontology augmented with concepts extracted from the documents pertaining to the activity in question.

� Troussov et al. 2008 classify Nepomuk-Simple recommendations into two major types. – The first type of recommendations is recommendation of the additional items to the

pile, when the user is working on an activity. – The second type of recommendations arises, for instance, when the user is browsing

Web; the Nepomuk-Simple can recommend that current resource might be relevant to one or more activities performed by the user. In both cases there is a need to operate with Clouds (fuzzy sets of PIMO nodes): Clouds describe topicality of documents in terms of PIMO, the pile itself is a Cloud.


Pile UI


Nepomuk use case: activity management

A user started to work on a new project CID.Using the Nepomuk SSD, she collects a “pile” of resources she needs while working on the project:

MS-Word documents, contacts, etcby drag-and-dropping resources from her desktop, by linking resources from e-mail (Mozilla Thunderbird) and web browser (Firefox) applications.


Nepomuk use case: activity management using IBM recommender codenamed “Galaxy”

Galaxy (IBM hybrid recommender) analyses the pile content and linkage structure

as a multidimensional network of concepts extracted from documents and links betweenconcepts, projects, project participants, meetings, document authors, … .

and provides handy recommendations of resources she might possibly need


Nepomuk use case: activity management

Galaxy can spot what the user might miss: “This web page might be relevant to your CID activity”


� Thank you !

Date post:	16-Jun-2015
Category:	Technology
Upload:	natalia-ostapuk
View:	228 times
Download:	0 times

2011 04 troussov_graph_basedmethods-weakknowledge

Technology