Limits of Learning in Incomplete Networks
Timothy [email protected]
In collaboration with
1
Timothy Sakharov Sahely Bhadra Tina Eliassi-Rad
Supported by NSF CNS-1314603 & NSF IIS-1741197
Background: Incomplete Networks
- Networkdataisoftenincomplete
- Acquiringmoredataisoftenexpensiveand/orhard
- Researchquestion:Givenanetworkeddatasetandlimitedresourcestocollectmoredata,howcanyougetthemostbangforyourbuck?
2
Two general approaches to network completion
3
CollectmoredataDon’tcollectmoredata
Two general approaches to network completion
4
Collectmoredata
Assumeanetworkmodel
Combinenetworkmodelwithincompletedatatogetamodelofthenetworkstructure
Infermissingdatafromthismodel
- [Kimetal.2011]- [Chenetal.2018]
Don’tcollectmoredata
Collectmoredata
Two general approaches to network completion
5
EstimateStatisticsfrompartiallyobservednetwork- [Soundarajanetal.2015]- [Soundarajanetal.2016]Utilizeanexplore-exploitapproach- [PfeifferIIIetal.2014]- [Soundarajanetal.2017]- [Muraietal.2018]- [Madhawaetal.2018]- Thiswork!
Assumeanetworkmodel
Combinenetworkmodelwithincompletedatatogetamodelofthenetworkstructure
Infermissingdatafromthismodel
- [Kimetal.2011]- [Chenetal.2018]
Don’tcollectmoredata
Our solution: Network Online Learning (NOL)
6
- Learntogrowanincompletenetworkthroughsequential,optimalqueries(tosomeAPI)
- Agnostictobothdatadistributions andsamplingmethod
- Interpretablefeaturesthatarecomputableonline
- ResearchQuestion:Givenanetworkeddatasetandlimitedresourcestocollectmoredata,howcanyougetthemostbangforyourbuck?
Assumptions
7
Amodeloftheunderlyinggraph.
Howtheinitialsamplewascollected.
WeknowtheAPIaccessmodel(completevsincompletequeries).
Theunderlyingnetworkisstatic(probingthesamenodetwicegivesnonewinformation).
Weassume... Wedonotassume...
Inputs:
- : Incomplete network
- b : probing budget
- r : reward function
8
Probe b times to maximize cumulative reward
Output:
Network after b probes
Network Online Learning (NOL)
Example reward function: - number of new nodes observed
NOL algorithm
9
NOL algorithm
10
NOL algorithm
11
Observe the current state
NOL algorithm
12
Observe the current state
Feature vector of node iRegression weights
NOL algorithm
13
Observe the current state
14
Observe the current state
Choose the next action
NOL algorithm
15
Observe the current state
Choose the next action
NOL algorithm
Take action, update network, collect reward
16
Observe the current state
Choose the next action
Take action, update network, collect reward
Update parameters
NOL algorithm
17
Observe the current state
Choose the next action
Take action, update network, collect reward
Update parameters
NOL algorithm
Online linear regression following Strehl et al., NIPS 2008.
18
Observe the current state
Choose the next action
Take action, update network, collect reward
Repeat until budget depleted
Update parameters
NOL algorithm
NOL algorithm
19
1. Observe the current state
2. Choose the next action
3. Take action, update network, collect reward
4. Update parameters
5. Repeat until budget depleted
Features
20
Features
21
i
Feature: In-sample degree
22
i
Feature: In-sample clustering coefficient
23
i
Feature: Normalized size of connected component
24
i
Feature: Fraction of probed neighbors
25
i
Feature: Lost Reward
26
u v
wKey idea: The order in which we probe nodes can impact the reward they yield.
Feature: Lost Reward
27
Unobserved node (potential reward for u or v)
Partially observed nodes
u v
wKey idea: The order in which we probe nodes can impact the reward they yield.
ResearchQuestion:Givenanetworkeddataset,andlimitedresourcestocollectmoredata,howcanyougetthemostbangforyourbuck?
- Potentialbangforyourbuckdependsonnetworkstructure!
28
Heterogeneity in network properties creates a learning “spectrum”
29
Heterogeneity in network properties creates a learning “spectrum”
30
Learning not useful
Heuristics optimal
Heterogeneity in network properties creates a learning “spectrum”
31
Potential for learning
Learning not useful
Heuristics optimal
Heterogeneity in network properties creates a learning “spectrum”
32
Homogeneous degree dist.
Potential for learning
Learning not useful
Heuristics optimal
Heterogeneity in network properties creates a learning “spectrum”
33
Learning not useful
Homogeneous degree dist.
Heterogenous degree dist.
1Avrachenkovetal.INFOCOMWKSHPS (2014)
Heuristics optimal1
Potential for learning
Heterogeneity in network properties creates a learning “spectrum”
34
1Avrachenkovetal.INFOCOMWKSHPS (2014)
Homogeneous degree dist.
Heterogenous degree dist. with rich structure
Heterogenous degree dist.
Potential for learning
Learning not useful
Heuristics optimal1
Experiments
35
Heuristic Baselines
36
- Highdegree- Probetheunprobednodewithmaximumdegree
- Highdegreew/jump- Probetheunprobednodewithmaximumdegree,randomlyjump
withprobabilityp- Lowdegree
- Probetheunprobednodewithminimumdegree- Random
- Probeanodechosenuniformlyatrandomfromtheunprobednodes
iKNN-UCB [Madhawa+ArXivpreprint,2018]
37
- K-NearestNeighborsUpperConfidenceBound- Nonparametricmulti-armedbanditapproach
- Choosenodetoprobebycombiningnearestneighborrewardinformation(basedonEuclideandistancebetweenfeaturevectors)+extentofpreviousexplorationofsimilaractions
Heterogeneity in network properties creates a learning “spectrum”
38
Homogeneous degree dist.
Heterogenous degree dist. with rich structure
Heterogenous degree dist.
Potential for learning
Learning not useful
Heuristics optimal
Heterogeneity in network properties creates a learning “spectrum”
39
Learning not useful
Heuristics optimal
Potential for learning
Homogenous degree dist.
Heterogenous degree dist.
Heterogenous degree dist. with rich structure
Heterogeneity in network properties creates a learning “spectrum”
40
Learning not useful
Heuristics optimal
Potential for learning
Erdos-Renyi Model
Heterogenous degree dist. with rich structure
Heterogenous degree dist.
Heterogeneity in network properties creates a learning “spectrum”
41
Learning not useful
Heuristics Optimal
Potential for learning
Erdos-Renyi Model
BTER Model1
1C.Seshadhrietal.PhysicalReviewE (2012)
Heterogenous degree dist.
Heterogeneity in network properties creates a learning “spectrum”
42
Learning not useful
Heuristics Optimal
Potential for learning
Erdos-Renyi Model
Barabasi-Albert Model
BTER Model1
1C.Seshadhrietal.PhysicalReviewE (2012)
Results - Learning Spectrum
43
Learning not useful
Heuristics Optimal
Potential for learning
Erdos-Renyi Model
Barabasi-Albert Model
BTER Model
Results - Learning Spectrum
44
Learning not useful
Heuristics Optimal
Potential for learning
Erdos-Renyi Model
Barabasi-Albert Model
Allmethodsareindistinguishable,learningdoesn’thelporhurt!
BTER Model
Results - Learning Spectrum
45
Heuristics Optimal
Potential for learning
Erdos-Renyi Model
Barabasi-Albert Model
Allmethodsareindistinguishable,learningdoesn’thelporhurt!
Learning not useful
BTER Model
Results - Learning Spectrum
46
Heuristics Optimal
Potential for learning
Erdos-Renyi Model
Barabasi-Albert Model
HighdegreeheuristicisoptimalinBAmodel,butNOLlearnstoprobeliketheheuristic!
Allmethodsareindistinguishable,learningdoesn’thelporhurt!
Learning not useful
BTER Model
Results - Learning Spectrum
47
Heuristics Optimal
Potential for learning
Erdos-Renyi Model
Barabasi-Albert Model
Allmethodsareindistinguishable,learningdoesn’thelporhurt!
HighdegreeheuristicisoptimalinBAmodel,butNOLlearnstoprobeliketheheuristic!
Learning not useful
BTER Model
Heterogeneity in network properties creates a learning “spectrum”
48
Learning Not Useful
Heuristics Optimal
Potential for learning
Coauthorshipnetwork
Citationnetwork
EmailCommunicationNetwork
N=23kE=89k△s=78.7k
N=36.7kE=184k△s=727k
N=6.7kE=17k△s=21.6k
DBLP Cora Enron
Heterogeneity in network properties creates a learning “spectrum”
49
Learning Not Useful
Heuristics Optimal
Potential for learning
✓✓✓DBLP Cora Enron
Summary
50
- NetworkOnlineLearning canlearntoprobeonlinewithminimalassumptions
- Successistiedtopropertiesoftheunderlyingnetwork:
- Spectrumbasedonobjectivefunctionbeingmaximized(degreedistributionintheseexperiments)
- NOLcanlearntobehaveliketheoptimalheuristic
- Preliminaryexperimentssuggestsomerealworldcomplexnetworksfallinthe“learnable”category
References
51
Kimetal.(2011).TheNetworkCompletionProblem:InferringMissingNodesandEdgesinNetworks.SIAMInternationalConferenceonDataMining.
Seshadhrietal.(2012).Communitystructureandscale-freecollectionsofErdos-Renyigraphs,PhysicalReviewE.
Avrachenkovetal.(2014).Payfew,influencemost:Onlinemyopicnetworkcovering.Proceedings- IEEEINFOCOM.
PfeifferIIIetal.(2014).ActiveExplorationinNetworks:UsingProbabilisticRelationshipsforLearningandInference.InProceedingsofthe23rdACMInternationalConferenceonConferenceonInformationandKnowledgeManagement.
Soundarajanetal.(2015).MaxOutProbe:AnAlgorithmforIncreasingtheSizeofPartiallyObservedNetworks.The2015NIPSWorkshoponNetworksintheSocialandInformationSciences.
Soundarajanetal.(2016).MaxReach:Reducingnetworkincompletenessthroughnodeprobes.2016IEEE/ACMInternationalConferenceonAdvancesinSocialNetworksAnalysisandMining(ASONAM).
Soundarajanetal.(2017).ε-WGX:AdaptiveEdgeProbingforEnhancingIncompleteNetworks.InProceedingsofthe2017ACMonWebScienceConference.
Muraietal.(2018).SelectiveHarvestingOverNetworks.DataMiningandKnowledgeDiscovery.
Madhawaetal.(2018).ExploringPartiallyObservedNetworkswithNonparametricBandits.arXiv:1804.07059.
Chenetal.(2018).FlexibleModelSelectionforMechanisticNetworkModelsviaSuperLearner.arXiv:1804.00237.
Thanks!
Slides at http://eliassi.org/larock_netsci18.pdf