Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | larca-upc |
View: | 1,665 times |
Download: | 2 times |
A discussion on sampling graphs to approximatenetwork classification functions
(work in progress)
Gemma C Garriga
22.09.2011
Outline
Starting point
Classification in networks
Samples of graphs
Some first experiments
Outline
Starting point
Classification in networks
Samples of graphs
Some first experiments
Network classification problem
Learn a classification function f : X 7→ Y for nodes x ∈ G
Relaxing: f : X 7→ R means to infer a probability Pr(y|{x}n,G)
Aka collective classification or within-network prediction:nodes with the same label tend to be clustered together
Network classification problem
Challenges
Sparsely labeled: few labeled nodes but many unlabeled nodes
Heterogeneous types of contents, multiple types of links
Network structure (what edges are in the graph) affects theaccuracy of the models
Networks are large size
Related to
Semi-supervised learning based on graphs
Semi-supervised learning
GoalBuild a learner f that can label input instances x into differentclasses or categories y
Notation
input instance x, label y
learner f : X 7→ Y
labeled data (Xl,Yl) = {(x1:l,y1:l)}
unlabeled data Xu = {xl+1:n}, available during training
usually l � n
Semi-supervised learning
Use both labeled and unlabeled data to build better learners
Semi-supervised graph-based methods
Transform vectorial data into a graph
Nodes: labeled and unlabeled Xl ∪ Xu
Edges: weighted edges (xi, xj) computed from features
Weights represent similarity, e.g. wi,j = exp(−γ||xi − xj||2)
Sparsify with: k-nearest neighbor graph, threshold graph (εdistance graph) . . .
The general idea is that there will be similiarity implied via allpaths in the graph
Semi-supervised graph-based methods
Smoothness assumption
In a weighted graph, nodes that are similar are connected by heavyedges (high density region) and therefore tend to have the samelabel. Density is not uniform
[From Zhu et al. ICML 2003]
The harmonic function
Relaxing discrete labels to real values with f : X −→ R thatsatisfies:
1 f(xi) = yi for i = 1 . . . l
2 f minimizes the energy function∑ij
wij(f(xi) − f(xj))2
3 it is the mean of the associated Gaussian random field
4 the harmonic property means
f(xi) =
∑j∼i wijf(xj)∑
j∼i wij
Harmonic solution with iterative method
An iterative method as in self-training:
1 Set f(xi) = yi for i = 1 . . . l and f(xj) arbitrary for xj ∈ Xu
2 Repeat until convergence:
. Set f(xi) =∑
j∼i wijf(xj)∑j∼i wij
. Keep always f(Xl) fixed
A random walk interpretation on directed graphs
Randomly walk from node i to j with probabilitywij∑k wik
The harmonic function tells about Pr(hit label 1 | start from i)
[From Zhu’s tutorial at ICML 2007]
Harmonic solution with graph Laplacian
Let W be the n× n weight matrix on Xl ∪ Xu
. Symmetric and non-negative
Let diagonal degree matrix D: Dii =∑n
j=1 wij
Graph Laplacian is ∆ = D − W
The energy function can be rewritten:
minf
∑ij
wij(f(xi) − f(xj))2 = min
fft∆f
Harmonic solution solves fu = −∆uu−1∆ulYl
Complexity of O(n3)
Outline
Starting point
Classification in networks
Samples of graphs
Some first experiments
Characteristics of network data
So, can one use semi-supervised learning based on graphs fornetworks? Some reflections:
+ The smoothness assumption can be seen as a clustering assumption,or community structure assumption
Groups of nodes that are similar tend to be more denselyconnected between them than with the rest of the network
+ The laplacian matrix could help to integrate both vectorial andstructure of the network
− However, networks have scale free of the degree distributions
Structure of the links influences iterative propagation
− Networks can be very large
How to use graph samples
First idea:
1 For i = 1 . . . |samples| do:
Extract graph sample Gi ≺ G from the full graphApply harmonic iterative algorithm to Gi to get f(u),u ∈ Gi
2 Average f(u) for u ∈ {Gi } selected in several samples
3 For all nodes v that did appear in any sample do:
Make random walks to k nodes touched by samplesCompute weighted average of the k labels found
f(v) =1∑
j={1...k} d(v,uj)
∑j={1...k}
d(v,uj)f(uj)
How can samples help?
Samples have less edges than the full graph, so diffusion is differentfrom the full graph
Subgraphs will be random, so maybe a good behavior on average
The iterative algorithm (or laplacian harmonic) will be applied onlyon samples. Complexity is reduced
The nodes not contained in any sample, will be labeled followingthe assumptions of the random walk interpretation given by theharmonic iterative solution
[From Zhu’s tutorial at ICML 2007]
How cannot samples help?
It depends on how samples in the graph are extracted. Thingsto take into account
Including some labeled points from all classes in the sampledgraph
Extracting a connected subgraph
Sampling on the vectorial data, on the structural edges, orintegrating both in the sampling process (like random walksampling)
It is just an approximation, how good is it? can we saysomething theoretically? ensemble approaches based onsamples?
Going further: sparsify the samplesFinding some sort of ”backbone”
Second idea:
1 For i = 1 . . . |samples| do:
Extract graph sample Gi ≺ G from the full graphApply harmonic iterative algorithm to Gi to obtain f(u),u ∈ Gi
2 From S = {Gi} find nodes (or subgraph) U ≺ S with |U| = l s.t.
f(U ′) = g(f(U))
where U ′ = S\U and g is some defined (linear) transformation
3 Label any other node v by k random walks to nodes in theprevious central nodes (or subgraph) U
Outline
Starting point
Classification in networks
Samples of graphs
Some first experiments
Induced subgraph samplingFrom ”Statistical analysis of network data”, Kolaczyk
Sample n vertices without replacement to formV∗ = {i1, . . . , in}
Edges are observed for vertex pairs i, j ∈ V∗ for which{i, j} ∈ E, yielding E∗
Selected nodes in yellow, observed edges in orange
Incident subgraph samplingFrom ”Statistical analysis of network data”, Kolaczyk
Select n edges with random sampling without replacement, E∗
All incident vertices to E∗ are then observed, providing V∗
Selected edges in yellow, observed nodes in orange
Star and snowball samplingFrom ”Statistical analysis of network data”, Kolaczyk
Take initial vertex sample V∗0 without replacement of size n.
Observe all edges incident to i ∈ V∗0 , yielding E∗
For labeled star sampling we observe also vertices i ∈ V\V∗0 to
which edges in E∗ are incident
For snowball sampling we iterate the process of labeled starsampling to neighbors up to the k-th wave
1-wave: yellow, 2-wave: orange, 3-wave: red
Link tracing samplingFrom ”Statistical analysis of network data”, Kolaczyk
A sample S = {s1, . . . , sns } of ”sources” are selected from V
A sample T = {t1, . . . , tnt } of ”targets” are selected from V\S
A path is sampled between pairs (si, ti) and all vertices andedges in the paths are observed, yielding G∗ = (V∗,E∗)
Sources {s1, s2} to targets {t1, t2}
Some other sampling algorithms
Other possible ideas of sampling algorithms for graphs:
Random node selection, random edge selection
Selecting nodes with probability proportional to ”page rank”weight
Random node neighbor
Random walk sampling
Random jump sampling
Forest fire sampling
Some challenges of sampling with labels
Including labels in the samples
Size of the samples
Isolated nodes
Edges of structure or content
Outline
Starting point
Classification in networks
Samples of graphs
Some first experiments
Experimental set-up
Classification algorithm
In the samples, compute the harmonic function f in iterativefashion for ≈ 10 iterations
Final classification: for every node u assign label l that hasmax value (probability) f(u)
Keep 1/3 of the labels
Datasets
Graph generated data: (1) cluster generator and (2)community guided attachment generator
Other: Webkb, IMDB, Cora
What happens in one sample?
Incident(left) & induced (right), Webkb (Cornell), 867 nodes
Blue: error of harmonic iterative on the full graphGreen: error on one single increasing-size sample
What happens in one sample?
Link tracing, Imdb, 1169 nodes
Blue: error of harmonic iterative on the full graphGreen: error on one single increasing-size sample
What happens in one sample?
Random node-edge selection, Imdb, 1169 nodes
Blue: error of harmonic iterative on the full graphGreen: error on one single increasing-size sample
Full classification vs sampling classification
Induced & Incident, Cora, 1878 nodes
Blue: error of harmonic iterative on the full graphGreen: error of sampling classification on increasing number ofsamples
Full classification vs sampling classification
Induced & Incident, Webkb (Wisconsin), 1263 nodes
Blue: error of harmonic iterative on the full graphGreen: error of sampling classification on increasing number ofsamples
Full classification vs sampling classification
Link tracing, CGA generator, 1000 nodes
Blue: error of harmonic iterative on the full graphGreen: error of sampling classification on increasing number ofsamples
Some discussion
Samples of graphs can serve to avoid high complexity (O(n3))of applying learning algorithm in the full graph
Choice of sampling methods (e.g. snowball is bad for highlyconnected graphs, link tracing is useful in highly clusteredgraphs)
Approximation of accuracy is reasonable with small number ofsamples already
Question of the I/O operations in the graph
Samples of the graph to estimate a distribution?
Ensemble approaches?
Approximation in terms of shortest paths?