Graph Mining: Overview of different graph
models
Graph Mining course Winter Semester 2016
DavideMottin,KonstantinaLazaridouHassoPlattnerInstitute
Lecture road
2
Anomalydetection(previouslecture)
RepresentativesofProbabilistic(Uncertain)graphs
IntroductiontoSignednetworks
Graph models▪ Graphsareeverywhere!▪ Variousinterestingmodelsthatwehaven’tanalyzedinthelecture..
• graphstreams• evolvinggraphs• attributedgraphs• probabilisticgraphs• signedgraphs• coloredgraphs• ...
3
Definitions
▪ Graphstream• sequenceofunorderedpairse={u,v}whereu,v∈ [n],S=(e1,e2,...,emi)
▪ Timeevolvinggraph• sequenceofstaticgraphs{G1,G2,...,Gn},whereGt=(Vt,Et)isasnapshotoftheevolvinggraphattimestampt
▪ Attributedgraph• G=(V,E,A)whereVisthevertexset,Eistheedgeset,andAistheattributesetthatcontainsunaryattributeai(linkedtoeachnodeni)andbinaryattributeaij (linkedtoeachedgeek=(ni ,nj )∈ E),
▪ Coloredgraph• G=(V,E)inwhicheachvertexisassignedacolor.
⁃ properlycoloredgraph:colorassignmentsconformtothecoloringrulesappliedtothegraph
4
Probabilistic graphs - Outline
▪ Uncertaintyindata▪ Introductiontouncertaingraphs• Modeldefinition• Applications• Problems
▪ Findingrepresentativesinprobabilisticgraphs• Problemdefinition• Algorithms
GRAPH MINING WS 2016 5
Uncertainty in data
▪ Noiseingeneration• sensors
▪ Noiseincollection• missinginstances
▪ Biologicaldata• protein-proteininteractionprobability
▪ Problem’snature• risk,trust,influence,status
▪ Anonymizeddata• privacy preservation of user generated data
GRAPH MINING WS 2016 6
What is an uncertain graph?
▪ Agraphwhereeachedgehasanassociatedprobabilityp:[0,1]toit
GRAPH MINING WS 2016
Figure 1: (left) An unweighted probabilistic graph G, (right) G with the expected vertex degrees (in Italics)associated to each node
7
Possible applications and problems
▪ Modellingofprobabilitiesinprotein-proteininteractiongraphs▪ Modellingrelationshipsinsocialgraphs
GRAPH MINING WS 2016
▪ Problemsthatapplytodeterministicgraphs• algorithmsneedtoberedesignedtoincorporateuncertainty
▪ Dataanonymization• oneofthepossibleworldscorrespondstheoriginaldata
▪ Frequentsubgraphmining• frequencyisredefinedusingtheedgeprobabilities
▪ Queriesbasedonshortestpaths• returns paths with very low probabilities
8
Graph model definition
▪ AprobabilisticgraphisrepresentedasG =(V, E,W,p),whereV isthesetofvertices,E isthesetofedges,forweightedgraphsW:V х V →Rdenotestheweightsassociatedwitheveryedgeandpmapseverypairofnodestoarealnumber in[0,1]▪ puv representstheprobabilitythatedge(u,v)existsintheuncertainnetwork▪ ForaprobabilisticgraphG,2 " deterministicgraphscanbegenerated
• thesegraphsarecalledpossibleworlds
GRAPH MINING WS 2016 9
Possible world semantics [1]▪ Oftenintheliteratureitisassumedthattheedgeprobabilities
areindependent• isthisalwaysthecase?
GRAPH MINING WS 2016
[1] S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds, SIGMOD 1987
▪ Forsimplicity,variousapproachestreattheprobabilitiesoftheedgesasweights
▪ Othersonlyconsidertheedgeshavingaprobabilityp>t• notvalidassumptionsinmanyscenarios!
10
Sampling
▪ TheprobabilitythatacertaingraphG=(V,E)willbesampledfromGiscomputedasfollows:•P[G]=Π(u,v)ϵEPuv*Π(u,v)ϵ(VxV)\E(1– Puv)
▪ GivenGandthevertexdegrees,wecanalsocalculatedthevertexdiscrepancies•disu(G)=degu(G)– degu(G),whereuisanodeinG•G’sdiscrepancyisdefinedasthesumofallnodediscrepancies
GRAPH MINING WS 2016
Figure 2: (left) G with the expected vertex degrees associated to each node, (right) a certain instance G of Gwith the vertex discrepancies
G=argminG:worldofG Δ(G)
11
What if we could work on a deterministic graph instead? How do we benefit?
▪ Computationalcomplexitywouldbemuchlower!▪ Traditionaldataminingalgorithmscouldbeapplied▪ Whichcharacteristicsshouldthiscertaingraphmaintainfromtheuncertainone?• samenumberofvertices..•whichedgesshouldbeincluded?
GRAPH MINING WS 2016 12
Outline - Probabilistic graphs
▪ Uncertaintyindata▪ Introductiontouncertaingraphs• Modeldefinition• Applications• Problems
▪ Findingrepresentativesinprobabilisticgraphs• Problemdefinition• Algorithms
GRAPH MINING WS 2016 13
Finding representatives in probabilistic graphs [2]▪ ArepresentativeGofaprobabilisticgraphG isadeterministicgraphthatitsverticeswillpresenttheleastpossiblediscrepancy
▪ Moreformally• Given an undirected uncertain graph G = (V, E, W, p), the representative isan exact instance G of G (possible world), such that each vertex degreewill have the minimum deviation from its expected value
GRAPH MINING WS 2016
[2]ThePursuitofaGoodPossibleWorld:ExtractingRepresentativeInstancesofUncertainGraphs,PanosParchaset.al,ACMSIGMOD2014
14
Introduced algorithms▪ Baseline1:Greedyprobability
• eachedgee=(u,v)belongstoG,ifitdecreasesthetotaldiscrepancy
▪ Baseline2:Mostprobable• eachedgee=(u,v)belongstoG,ifpe ≥0.5holds
▪ ADR(averagedegreerewiring)• aimsatpreservingtheexpectedaveragedegreeofG
▪ ABM(approximateb-matching)• preserves the expected vertex degrees
GRAPH MINING WS 2016 15
ADR: average degree rewiring
▪ Whatistheexpectedaveragedegree?• degavg(G)=2*P/|V|,wherePisthesumofalledgeprobabilitiesinG
▪ Inordertopreserveit,GshouldcontainexactlyPedges▪ MainstepsofADR
•ConstructaseedsetE0oftheedgesinG• Foragivennumberoftimesk⁃SwaptheedgesinE0withedgesinE\E0,sothattheoveralldiscrepancyoftherepresentativedecreases
GRAPH MINING WS 2016 16
Pseudocode
▪ Initialization,computationofP,sortEindecreasingorderbytheedgeprobabilities▪ ForeacheinE
• ifrandomx<=pe:insertintoE0,updateG
▪ C=E\E0▪ Forktimes
• ForeachnodeuinG⁃I=incidentedgesofu⁃chooserandomlye1inIande2inCtoswap⁃computetheoveralldiscrepancybeforeandafterthepotentialswap⁃ifimprovement:swape1withe2inE,Crespectively,updatediscrepancies
GRAPH MINING WS 2016 17
ADR example: edge probabilities
GRAPH MINING WS 2016 18
ADR: a possible world and the discrepancies
GRAPH MINING WS 2016 19
ADR: first iterations
GRAPH MINING WS 2016 20
d1+d2 < 0 explanation▪ Forreplacing(u,v)with(x,y)
▪ d1=|disu(G)- 1|+|disv(G)– 1|- (|disu(G)|+|disv(G)|)
▪ d2=|disx(G)+1|+|disy(G)+1|- (|disx(G)|+|disy(G)|)
▪ Sumuv_bef=|disu(G)|+|disv(G)|
▪ Sumuv_after=|disu(G)– 1|+|disv(G)– 1|
▪ Sumxy_bef=|disx(G)|+|disy(G)|
▪ Sumxy_after=|disx(G)+1|+|disy(G)+1|
▪ d1=Sumuv_after– Sumuv_bef
▪ d2=Sumxy_after– Sumxy_bef
▪ Ifd1andd2arepositive,then• Sumuv_after>Sumuv_bef
• Sumxy_after>Sumxy_bef⁃ noneoftheunderlyingnodesbenefitsfromtheswap...
GRAPH MINING WS 2016 21
References
▪ Uncertaindata•Ontherepresentationandqueryingofsetsofpossibleworlds•Asurveyofuncertaindataalgorithmsandapplications
▪ Uncertaingraphs• Thepursuitofagoodpossibleworld:extractingrepresentativeinstancesofuncertaingraphs
•Uncertaingraphsparsification•Uncertaingraphprocessingthroughrepresentativeinstances• Triangle-basedrepresentativepossibleworldsofuncertaingraphs•Clusteringlargeprobabilisticgraphs•Algorithmsformininguncertaingraphdata•K-nearestneighborsinuncertaingraphs
GRAPH MINING WS 2016 22
Lecture road
23
Anomalydetection
RepresentativesofProbabilistic(Uncertain)graphs
Introductiontosignednetworks
What is a signed network?
▪ ItisagraphG=(V,E),whereeachedgeismappedtoasign▪ Asigncanbepositiveornegative▪ Thesignofapathistheproductofthesignsofitsedges▪ Typicallyasignednetworkisdenotedby:
• Σ=G(V,E,σ),whereσ,orthesignatureofthegraph,isthefunctionσ:E->(+,-)
GRAPH MINING WS 2016
u v
k
+/-
+/-+/-
24
What is balance?
▪ History..• FritzHeider (psychologist)andFrankHarary (mathematician)laythefoundationsofthesignedgraphsandthebalancetheory
• OriginalideaofP-O-Xmodel⁃ howaresocialrelationsmodeled?aretheybalanced?
GRAPH MINING WS 2016
“Theenemyofmyenemyismyfriend”!
P O
X
+
+-
25
Example of the P-O-X model
▪ ImaginethatyouarepersonPandthatOissomeone,whomyouthinkhighlyof,nowimagineXisapresidentialcandidateyoudislike,butXvehementlyendorseesO.▪ Whatdoyoususpectwouldhappen?
GRAPH MINING WS 2016
+
+-
thesituationisunbalanced...
PneedstoagreewithhisfriendO,orneedstounfriendO!
26
Balance theory
▪ Theorem1:Gisbalancedifeverypathpbetweenu,vhavethesamesign▪ Theorem2:AsignedgraphisbalancedifandonlyifVcanbebipartitioned,s.t.eachedgebetweenthepartsisnegativeandeachedgewithinapartispositive
GRAPH MINING WS 2016 27
Status theory [3]
▪ Thesignsinbalancetheoryareperceivedaslikes/dislikes▪ Cantheyalsoindicateanotherrelation?
• inthecontextofdirectedsocialnetworks,theintentionoftheusercreatingthelinkmatters..
GRAPH MINING WS 2016
P O
X
+
-
“I think O has a lower status than I do”
“I think O has a higher status than I do”
[3]SignedNetworksinSocialMedia,JureLeskovec,SIGCHI2010
28
Some possible applications
▪ ModellinginteractionsinChemical/Biologicalnetworks▪ Socialnetworkanalysis▪ Politicalandeconomicalrelations
GRAPH MINING WS 2016
GraphAlgorithms,ApplicationsandImplementations,CharlesPhillips
29
References
▪ Morematerial• Signedgraphs,MatthiasBeck•GraphAlgorithms,ApplicationsandImplementations,CharlesPhillips•Harary:Onthenotionofbalanceofasignedgraph•Networks,Crowds,andMarkets:ReasoningaboutaHighlyConnectedWorld,Chapter5:PositiveandNegativeRelationships
▪ Researchproblemsonsignedgraphs• Signedgraphsinsocialmedia•CommunityMininginSignedSocialNetworks–AnAutomatedApproach•PolarityRelatedInfluenceMaximizationinSignedSocialNetworks•NodeClassificationinSignedSocialNetworks•PredictingPositiveandNegativeLinksinOnlineSocialNetworks
GRAPH MINING WS 2016 30
In the next episodes …
3rdpresentationdate
CourseEvaluation
Examsandmaybemore…!
31
Questions?
32
References▪ Akoglu,L.,McGlohon,M.andFaloutsos,C..Oddball:Spottinganomaliesinweightedgraphs.PAKDD,2010.
▪ Tong,H.andLin,C.Y.Non-NegativeResidualMatrixFactorizationwithApplicationtoGraphAnomalyDetection. In SDM,2011.
▪ Xing,E.P.,Ng,A.Y.,Jordan,M.I.andRussell,S.Distancemetriclearningwithapplicationtoclusteringwithside-information.In NIPS,2002.
33