Estimating Diffusion Network Structures
A. Proof of Lemma 9Lemma 9 Given log-concave survival functions and con-cave hazard functions in the parameter(s) of the pairwisetransmission likelihoods, then, a sufficient condition for theHessian matrix Qn to be positive definite is that the hazardmatrix Xn
(↵) is non-singular.
Proof Using Eq. 5, the Hessian matrix can be expressedas a sum of two matrices, D
n
(↵) and X
n
(↵)X
n
(↵)
>.The matrix D
n
(↵) is trivially positive semidefinite bylog-concavity of the survival functions and concavity of thehazard functions. The matrix X
n
(↵)X
n
(↵)
> is positivedefinite matrix since X
n
(↵) is full rank by assumption.Then, the Hessian matrix is positive definite since it is asum a positive semidefinite matrix and a positive definitematrix.
B. Proof of Lemma 10Lemma 10 If the source probability P(s) is strictly posi-tive for all s 2 R, then, for an arbitrarily large numberof cascades n ! 1, there exists an ordering of the nodesand cascades within the cascade set such that the hazardmatrix Xn
(↵) is non-singular.
Proof In this proof, we find a labeling of the nodes (rowindices in X
n
(↵)) and ordering of the cascades (columnindices in X
n
(↵)), such that, for an arbitrary large numberof cascades, we can express the matrix X
n
(↵) as [T B],where T 2 Rp⇥p is an upper triangular with nonzero diag-onal elements and B 2 Rp⇥n�p. And, therefore, Xn
(↵)
has full rank (rank p). We proceed first by sorting nodes inR and then continue by sorting nodes in U :
• Nodes in R: For each node u 2 R, consider the setof cascades C
u
in which u was a source and i got in-fected. Then, rank each node u according to the ear-liest position in which node i got infected across allcascades in C
u
in decreasing order, breaking ties atrandom. For example, if a node u was, at least once,the source of a cascade in which node i got infectedjust after the source, but in contrast, node v was neverthe source of a cascade in which node i got infectedthe second, then node u will have a lower index thannode v. Then, assign row k in the matrix X
n
(↵) tonode in position k and assign the first d columns tothe corresponding cascades in which node i got in-fected earlier. In such ordering, Xn
(↵)
mk
= 0 for allm < k and X
n
(↵)
kk
6= 0.• Nodes in U : Similarly as in the first step, and assign
them the rows d + 1 to p. Moreover, we assign thecolumns d + 1 to p to the corresponding cascades in
which node i got infected earlier. Again, this order-ing satisfies that Xn
(↵)
mk
= 0 for all m < k andX
n
(↵)
kk
6= 0. Finally, the remaining columns n � pcan be assigned to the remaining cascades at random.
This ordering leads to the desired structure [T B], and thusit is non-singular.
C. Proof of Eq 7.If the Hazard vector X(t
c
;↵) is Lipschitz continuous inthe domain {↵ : ↵
S
� ↵
⇤min2
},
kX(t
c
;�)�X(t
c
;↵)k2
k1
k� �↵k2
,
where k1
is some positive constant. Then, we can boundthe spectral norm of the difference, 1p
n
(Xn
(�)�Xn
(↵)),
in the domain {↵ : ↵S
� ↵
⇤min2
} as follows:
|k 1pn
�X
n
(�)�X
n
(↵)
�k|2
= max
kuk2=1
1pnku�Xn
(�)�X
n
(↵)
�k2
= max
kuk2=1
1pn
vuutnX
c=1
hu,X(t
c
;�)�X(t
c
;↵)i2
1pn
qk21
nkuk22
k� �↵k22
k1
k� �↵k2
.
D. Proof of Lemma 3By Lagrangian duality, the regularized network inferenceproblem defined in Eq. 4 is equivalent to the following con-strained optimization problem:
minimize↵i `n(↵i
)
subject to ↵ji
� 0, j = 1, . . . , N, i 6= j,||↵
i
||1
C(�n
)
(20)
where C(�n
) < 1 is a positive constant. In this alternativeformulation, �
n
is the Lagrange multiplier for the secondconstraint. Since �
n
is strictly positive, the constraint isactive at any optimal solution, and thus ||↵
i
||1
is constantacross all optimal solutions.
Using that `n(↵i
) is a differentiable convex function byassumption and {↵ : ↵
ji
� 0, ||↵i
||1
C(�n
)} is aconvex set, we have that r`n(↵
i
) is constant across opti-mal primal solutions (Mangasarian, 1988). Moreover, anyoptimal primal-dual solution in the original problem mustsatisfy the KKT conditions in the alternative formulationdefined by Eq. 20, in particular,
r`n(↵i
) = ��n
z+ µ,
where µ � 0 are the Lagrange multipliers associated to the
Estimating Diffusion Network Structures
non negativity constraints and z denotes the subgradient ofthe `1-norm.
Consider the solution ↵ such that ||zSc ||1 < 1 and thusr
↵Sc `n(↵i
) = ��n
zSc+ µ
S
c . Now, assume there is anoptimal primal solution ↵ such that ↵
ji
> 0 for some j 2Sc, then, using that the gradient must be constant acrossoptimal solutions, it should hold that ��
n
zj
+ µj
= ��n
,where µ
ji
= 0 by complementary slackness, which impliesµj
= ��n
(1 � zj
) < 0. Since µj
� 0 by assumption,this leads to a contradiction. Then, any primal solution ↵must satisfy ↵
S
c= 0 for the gradient to be constant across
optimal solutions.
Finally, since ↵S
c= 0 for all optimal solutions, we can
consider the restricted optimization problem defined inEq. 17. If the Hessian sub-matrix [r2L(↵)]
SS
is strictlypositive definite, then this restricted optimization problemis strictly convex and the optimal solution must be unique.
E. Proof of Lemma 4To prove this lemma, we will first construct a function
G(u
S
) := `n(↵⇤S
+ u
S
)� `n(↵⇤S
)
+ �n
(k↵⇤S
+ u
S
k1
� k↵⇤S
k1
).
whose domain is restricted to the convex set U = {uS
:
↵⇤S
+ u
S
� 0}. By construction, G(u
S
) has the followingproperties
1. It is convex with respect to u
S
.2. Its minimum is obtained at u
S
:= ↵S
� ↵⇤S
. That isG(u
S
) G(u
S
), 8uS
6= uS
.3. G(u
S
) G(0) = 0.
Based on property 1 and 3, we deduce that any point in thesegment, L := {u
S
: uS
= tuS
+ (1� t)0, t 2 [0, 1]},connecting u
S
and 0 has G(uS
) 0. That is
G(uS
) = G(tuS
+ (1� t)0)
tG(uS
) + (1� t)G(0) 0.
Next, we will find a sphere centered at 0 with strictly po-sitive radius B, S(B) := {u
S
: kuS
k2
= B}, such thatfunction G(u
S
) > 0 (strictly positive) on S(B). We notethat this sphere S(B) can not intersect with the segmentL since the two sets have strictly different function values.Furthermore, the only possible configuration is that the seg-ment is contained inside the sphere entirely, leading us toconclude that the end point u
S
:= ↵S
�↵⇤S
is also withinthe sphere. That is k↵
S
�↵⇤S
k2
B.
In the following, we will provide details on finding sucha suitable B which will be a function of the regularizationparameter �
n
and the neighborhood size d. More specifica-lly, we will start by applying a Taylor series expansion and
the mean value theorem,
G(u
S
) = rS
`n(↵⇤S
)
>u
S
+ u
>S
r2
SS
`n(↵⇤S
+ buS
)u
S
+ �n
(k↵⇤S
+ u
S
k1
� k↵⇤S
k1
), (21)
where b 2 [0, 1]. We will show that G(u
S
) > 0 by boun-ding below each term of above equation separately.
We bound the absolute value of the first term using theassumption on the gradient, r
S
`(·),|r
S
`n(↵⇤S
)
>u
S
| krS
`k1kuS
k1
krS
`k1pdku
S
k2
4
�1�n
Bpd. (22)
We bound the absolute value of the last term using the re-verse triangle inequality.
�n
|k↵⇤S
+ u
S
k1
� k↵⇤S
k1
| �n
kuS
k1
�n
pdku
S
k2
.(23)
Bounding the remaining middle term is more challenging.We start by rewriting the Hessian as a sum of two matrices,using Eq. 5,
q = min
uS
u
>S
D
n
SS
(↵⇤S
+ buS
)u
S
+ n�1
u
>S
X
n
S
(↵⇤S
+ buS
)X
n
S
(↵⇤S
+ buS
)
>u
S
= min
uS
u
>S
D
n
SS
(↵⇤S
+ buS
)u
S
+ ku>S
X
n
S
(↵⇤S
+ buS
)k22
.
Now, we introduce two additional quantities,
�D
n
SS
= D
n
SS
(↵⇤S
+ buS
)�D
n
SS
(↵⇤S
)
�X
n
S
= X
n
S
(↵⇤S
+ buS
)�X
n
S
(↵⇤S
),
and rewrite q as
q = min
uS
⇥u
>S
D
n
SS
(↵⇤S
)u
S
+ n�1ku>S
X
n
S
(↵⇤S
)k22
+n�1ku>S
�X
n
S
k22
+ u
>S
�D
n
SS
u
S
+2n�1hu>S
X
n
S
(↵⇤S
),u>S
�X
n
S
i⇤.
Next, we use dependency condition,
q � Cmin
B2 �max
uS
|u>S
�D
n
SS
u
S| {z }T1
|
�max
uS
2|n�1hu>S
X
n
S
(↵⇤S
),u>S
�X
n
S
i| {z }T2
|,
and proceed to bound T1
and T2
separately. First, we boundT1
using the Lipschitz condition,
|T1
| = |X
k2S
u2
k
[D
n
k
(↵⇤S
+ buS
)�D
n
k
(↵⇤S
)]|
X
k2S
u2
k
k2
kbuS
k2
k2
B3.
Estimating Diffusion Network Structures
Then, we use the dependency condition, the Lipschitz con-dition and the Cauchy-Schwartz inequality to bound T
2
,
T2
1pnku>
S
X
n
S
(↵⇤S
)k2
1pnku>
S
�X
n
S
k2
pC
max
B1pnku>
S
�X
n
S
k2
pC
max
BkuS
k2
1pn|k�X
n
S
k|2
pC
max
B2k1
kbuS
k2
k1
pC
max
B3,
where we note that applying the Lipschitz condition im-plies assuming B < ↵min
2
. Next, we incorporate thebounds of T
1
and T2
to lower bound q,
q � Cmin
B2 � (k2
+ 2k1
pC
max
)B3. (24)
Now, we set B = K�n
pd, where K is a constant that
we will set later in the proof, and select the regulariza-tion parameter �
n
to satisfy �n
pd 0.5C
min
/K(k2
+
2k1
pC
max
). Then,
G(u
S
) � �4
�1�n
pdB + 0.5C
min
B2 � �n
pdB
� B(0.5Cmin
B � 1.25�n
pd)
� B(0.5Cmin
K�n
pd� 1.25�
n
pd).
In the last step, we set the constant K = 3C�1
min
, and wehave
G(u
S
) � 0.25�n
pd > 0,
as long aspd�
n
C2
min
6(k2
+ 2k1
pC
max
)
↵⇤min
� 6�n
pd
Cmin
.
Finally, convexity of G(u
S
) yields
k↵S
�↵⇤S
k2
3�n
pd/C
min
↵⇤min
2
.
F. Proof of Lemma 5Define zc
j
= [rg(tc;↵⇤)]
j
and zj
=
1
n
Pc
zcj
. Now,using the KKT conditions and condition 4 (Boundedness),we have that µ⇤
j
= Ec
{zcj
} and |zcj
| k3
, respectively.Thus, Hoeffding’s inequality yields
P (|zj
� µ⇤j
| > �n
"
4(2� "))
2 exp
� n�2
n
"2
32k23
(2� ")2
!,
and then,
P (kz � µ⇤k1 >�n
"
4(2� "))
2 exp
� n�2
n
"2
32k23
(2� "))2+ log p
!.
G. Proof of Lemma 6We start by factorizing the Hessian matrix, using Eq. 5,
Rn
j
=
⇥r2`n(↵j
)�r2`n(↵⇤)
⇤>j
(↵�↵⇤) = !n
j
+ �nj
,
where,
!n
j
=
⇥D
n
(↵j
)�D
n
(↵⇤)
⇤>j
(↵�↵⇤)
�nj
=
1
nV n
j
(↵�↵⇤)
V n
j
= [X
n
(↵j
)]
j
X
n
(↵j
)
> � [X
n
(↵⇤)]
j
X
n
(↵⇤)
>.
Next, we proceed to bound each term separately. Since[↵
j
]
S
= ✓j
↵S
+ (1 � ✓j
)↵⇤S
where ✓j
2 [0, 1], andk↵
S
� ↵⇤S
k1 ↵
⇤min2
(Lemma 4), it holds that [↵j
]
S
�↵
⇤min2
. Then, we can use condition 3 (Lipschitz Continuity)to bound !n
j
.
|!n
j
| k1
k↵j
�↵⇤k2
k↵�↵⇤k2
k1
✓j
k↵�↵⇤k22
k1
k↵�↵⇤k22
. (25)
However, bounding term �nj
is more difficult. Let us startby rewriting �n
j
as follows.
�nj
= (⇤
1
+⇤
2
+⇤
3
) (↵�↵⇤),
where,
⇤
1
= [X
n
(↵⇤)]
j
(X
n
(↵j
)
> �X
n
(↵⇤)
>)
⇤
2
= {[Xn
(↵j
)]
j
� [X
n
(↵⇤)]
j
}(Xn
(↵j
)
> �X
n
(↵⇤)
>)
⇤
3
=
�[X
n
(↵j
)]
j
� [X
n
(↵⇤)]
j
�X
n
(↵⇤)
>.
Next, we bound each term separately. For the first term, wefirst apply Cauchy inequality,
|⇤1
(↵�↵⇤)| k[Xn
(↵⇤)]
j
k2
⇥ |kXn
(↵j
)
> �X
n
(↵⇤)
>k|2
k↵�↵⇤k2
,
and then use condition 3 (Lipschtiz Continuity) and 4(Boundedness),
|⇤1
(↵�↵⇤)| nk
4
k1
k↵j
�↵⇤k2
k↵�↵⇤k2
nk4
k1
k↵�↵⇤k22
.
For the second term, we also start by applying Cauchy in-
Estimating Diffusion Network Structures
0.1 0.2 0.3 0.4 0.50.4
0.5
0.6
0.7
0.8
0.9
1
!
Su
cce
ss P
rob
ab
ility
P=16
P=32
P=64
P=128
P=256
(a) Chain (di = 1)
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
!
Su
cce
ss P
rob
ab
ility
p=31
p=63
p=127
(b) Stars with different # of leaves (di = 1)
0 0.02 0.04 0.06 0.08 0.10
0.2
0.4
0.6
0.8
1
!
Su
cess
Pro
ba
bili
ty
p=39
p=120
p=363
(c) Tree (di = 3)
Figure 5. Success probability vs. # of cascades. Different super-neighborhood sizes pi.
equality,
|⇤2
(↵�↵⇤)| k[Xn
(↵j
)]
j
� [X
n
(↵⇤)]
j
k2
⇥ |kXn
(↵j
)
> �X
n
(↵⇤)
>k|2
k↵�↵⇤k2
,
and then use condition 3 (Lipschtiz Continuity),
|⇤2
(↵�↵⇤)| nk2
1
k↵�↵⇤k22
.
Last, for third term, once more we start by applying Cauchyinequality,
|⇤3
(↵�↵⇤)| k[Xn
(↵j
)]
j
� [X
n
(↵⇤)]
j
k2
⇥ |kXn
(↵⇤)
>k|2
k↵�↵⇤k2
,
and then apply condition 1 (Dependency Condition) andcondition 3 (Lipschitz Continuity),
|⇤3
(↵�↵⇤)| nk
1
pC
max
k↵�↵⇤k22
Now, we combine the bounds,
kRnk1 Kk↵�↵⇤k22
,
whereK = k
1
+ k4
k1
+ k21
+ k1
pC
max
.
Finally, using Lemma 4 and selecting the regularization pa-rameter �
n
to satisfy �n
d C2
min
"
36K(2�")
yields:
kRnk1/�n
3K�n
d/C2
min
"
4(2� ")
H. Proof of Lemma 7We will first bound the difference in terms of nuclearnorm between the population Fisher information matrixQ
SS
and the sample mean cascade log-likelihood Qn
SS
.Define zc
jk
= [r2g(tc;↵⇤) � r2`n(↵⇤
)]
jk
and zjk
=
1
n
Pn
c=1
zcjk
. Then, we can express the difference betweenthe population Fisher information matrix Q
SS
and the sam-
ple mean cascade log-likelihood Qn
SS
as:
|kQn
SS
(↵⇤)�Q⇤
SS
(↵⇤)k|
2
|kQn
SS
(↵⇤)�Q⇤
SS
(↵⇤)k|
F
=
vuutdX
j=1
dX
k=1
(zik
)
2.
Since |z(c)jk
| 2k5
by condition 4, we can apply Hoeff-ding’s inequality to each z
jk
,
P (|zjk
| � �) 2 exp
✓��2n
8k25
◆, (26)
and further,
P (|kQn
SS
(↵⇤)�Q⇤
SS
(↵⇤)k|
2
� �)
2 exp
��K�2n
d2+ 2 log d
�(27)
where �2
= �2/d2. Now, we bound the maximum eigen-value of Qn
SS
as follows:
⇤
max
(Qn
SS
) = max
kxk2=1
x>Qn
SS
x
= max
kxk2=1
{x>Q⇤SS
x+ x>(Qn
SS
�Q⇤SS
)x}
y>Q⇤SS
y + y>(Qn
SS
�Q⇤SS
)y,
where y is unit-norm maximal eigenvector of Q⇤SS
. There-fore,
⇤
max
(Qn
SS
) ⇤
max
(Q⇤SS
) + |kQn
SS
�Q⇤SS
k|2
,
and thus,
P�⇤
max
(Qn
SS
) � Cmax
+ ��
exp
✓�K
�2n
d2+ 2 log d
◆.
Reasoning in a similar way, we bound the minimum eigen-
Estimating Diffusion Network Structures
0.4 0.8 1.2 1.6 20
0.2
0.4
0.6
0.8
1
!
Sourc
e P
robabili
ty
d=1
d=2
d=3
Figure 6. Success probability vs. # of cascades. Different in-degrees di.
value of Qn
SS
:
P�⇤
min
(Qn
SS
) Cmin
� ��
exp
✓�K
�2n
d2+ 2 log d
◆
I. Proof of Lemma 8We start by decomposing Qn
S
cS
(↵⇤)(Qn
S
cS
(↵⇤))
�1 as fo-llows:
Qn
S
cS
(↵⇤)(Qn
S
cS
(↵⇤))
�1
= A1
+A2
+A3
+A4
,
where,
A1
= Q⇤S
cS
[(Qn
S
cS
)
�1 � (Q⇤S
cS
)
�1
],
A2
= [Qn
S
cS
�Q⇤S
cS
][(Qn
S
cS
)
�1 � (Q⇤S
cS
)
�1
]
A3
= [Qn
S
cS
�Q⇤S
cS
](Q⇤SS
)
�1,
A4
= Q⇤S
cS
(Q⇤SS
)
�1,
Q⇤= Q⇤
(↵⇤) and Qn
= Qn
(↵⇤). Now, we bound
each term separately. The fourth term, A4
, is the easiestto bound, using simply the incoherence condition:
|kA4
k|1 1� ".
To bound the other terms, we need the following lemma:
Lemma 11 For any � � 0 and constants K and K 0, thefollowing bounds hold:
P [|kQn
S
cS
�Q⇤S
cS
k|1 � �]
2 exp
✓�K
n�2
d2+ log d+ log(p� d)
◆(28)
P [|kQn
SS
�Q⇤SS
k|1 � �]
2 exp
✓�K
n�2
d2+ 2 log d
◆(29)
P [|k(Qn
SS
)
�1 � (Q⇤SS
)
�1k|1 � �]
4 exp
✓�K
n�
d3�K 0
log d
◆(30)
Proof We start by proving the first confidence interval. Bydefinition of infinity norm of a matrix, we have:
P [|kQn
S
cS
�Q⇤S
cS
k|1 � �]
= P⇥max
j2S
c
X
k2S
|zjk
| � �⇤
(p� d)P⇥X
k2S
|zjk
| � �⇤,
where zjk
= [Qn �Q⇤]
jk
and, for the last inequality, weused the union bound and the fact that |Sc| p � d. Fur-thermore,
P⇥P
k2S
|zjk
| � �⇤ P [9k 2 S||z
jk
| � �/d]
dP [|zjk
| � �/d].
Thus,
P [|kQn
S
cS
�Q⇤S
cS
k|1 � �] (p� d)dP [|zjk
| � �/d].
At this point, we can obtain the first confidence bound byusing Eq. 26 with � = �/d in the above equation. Theproof of the second confidence bound is very similar andwe omit it for brevity. To prove the last confidence bound,we proceed as follows:
|k(Qn
SS
)
�1 � (Q⇤SS
)
�1k|1= |k(Qn
SS
)
�1
[Qn
SS
�Q⇤SS
](Q⇤SS
)
�1k|1
pd|k(Qn
SS
)
�1
[Qn
SS
�Q⇤SS
](Q⇤SS
)
�1k|2
pd|k(Qn
SS
)
�1k|2
|kQn
SS
�Q⇤SS
k|2
|k(Q⇤SS
)
�1k|2
pd
Cmin
|kQn
SS
�Q⇤SS
k|2
|k(Qn
SS
)
�1k|2
.
Next, we bound each term of the final expression in theabove equation separately. The first term can be boundedusing Eq. 27:
P⇥|kQn
SS
�Q⇤SS
k|2
� C2
min
�/2pd⇤
2 exp
��Kn�2
d3+ 2 log d
�,
The second term can be bounded using Lemma 6:
P⇥|k(Qn
SS
)
�1k|2
� 2
Cmin
⇤
= P⇥⇤
min
(Qn
SS
) Cmin
2
⇤
exp
⇣�K
n
d2+B log d
⌘.
Then, the third confidence bound follows.
Control of A1
. We start by rewriting the term A1
as
A1
= Q⇤S
cS
(Q⇤SS
)
�1
[(Q⇤SS
)� (Qn
SS
)](Qn
SS
)
�1,
Estimating Diffusion Network Structures
500 1000 1500 2000 25000.5
0.6
0.7
0.8
0.9
1
# cascades
F1
NetRate
Our method
First Edge
(a) Kronecker hierarchical, EXP
500 1000 1500 2000 25000.5
0.6
0.7
0.8
0.9
1
# cascades
F1
NetRate
Our method
First Edge
(b) Kronecker hierarchical, RAY
500 1000 1500 2000 25000.5
0.6
0.7
0.8
0.9
1
# cascades
F1
NetRate
Our method
First Edge
(c) Forest Fire, POW
500 1000 1500 2000 25000.5
0.6
0.7
0.8
0.9
# cascades
F1
NetRate
Our method
First Edge
(d) Forest Fire, RAY
Figure 7. F1-score vs. # of cascades.
and further,
|kA1
k|1 |kQ⇤S
cS
(Q⇤SS
)
�1k|1⇥ |k(Q⇤
SS
)� (Qn
SS
)k|1|k(Qn
SS
)
�1k|1.
Next, using the incoherence condition easily yields:
|kA1
k|1 (1� ")|k(Q⇤SS
)� (Qn
SS
)k|1⇥pd|k(Qn
SS
)
�1k|2
Now, we apply Lemma 6 with � = Cmin
/2 to have that|k(Qn
SS
)
�1k|2
2
Cminwith probability greater than 1 �
exp(�Kn/d2 + K 0log d), and then use Eq. 30 with � =
"Cmin
12
pd
to conclude that
P⇥|kA
1
k|1 � "
6
⇤ 2 exp
⇣�K
n
d3+K 0
log d⌘.
Control of A2
. We rewrite the term A2
as
|kA2
k|1 |kQn
S
cS
�Q⇤S
cS
k|1|k(Qn
SS
)
�1�(Q⇤SS
)
�1k|1,
and then use Eqs. 28 and 29 with � =
p"/6 to conclude
that
P⇥|kA
2
k|1 � "
6
⇤
4 exp
⇣�K
n
d3+ log(p� d) +K 0
log p⌘.
Control of A3
. We rewrite the term A3
as
|kA3
k|1 =
pd|k(Q⇤
SS
)
�1k|2
|kQn
S
cS
�Q⇤S
cS
k|1
pd
Cmin
|kQn
S
cS
�Q⇤S
cS
k|1.
We then apply Eq. 28 with � =
"Cmin
6
pd
to conclude that
P⇥|kA
3
k|1 � "
6
⇤ exp
⇣�K
n
d3+ log(p� d)
⌘,
and thus,
P⇥|kQn
S
cS
(Qn
SS
)
�1k|1 � 1� "
2
⇤
= O⇣exp(�K
n
d3+ log p)
⌘.
J. Additional experiments
Parameters (n, p, d). Figure 5 shows the success proba-bility at inferring the incoming links of nodes on the sametype of canonical networks as depicted in Fig. 2. We choosenodes the same in-degree but different super-neighboorhodset sizes p
i
and experiment with different scalings � of thenumber of cascades n = 10�d log p. We set the regula-rization parameter �
n
as a constant factor ofplog(p)/n
as suggested by Theorem 2 and, for each node, we usedcascades which contained at least one node in the super-neighborhood of the node under study. We used an ex-ponential transmission model and time window T = 10.As predicted by Theorem 2, very different p values lead tocurves that line up with each other quite well.
Figure 6 shows the success probability at inferring the in-coming links of nodes of a hierarchical Kronecker networkwith equal super neighborhood size (p
i
= 70) but differentin-degree (d
i
) under different scalings � of the number ofcascades n = 10�d log p and choose the regularization pa-rameter �
n
as a constant factor ofplog(p)/n as suggested
by Theorem 2. We used an exponential transmission modeland time window T = 5. As predicted by Theorem 2,in this case, different d values lead to noticeably differentcurves.
Comparison with NETRATE and First-Edge. Figure 7compares the accuracy of our algorithm, NETRATE andFirst-Edge against number of cascades for different typeof networks and transmission models. Our method typi-cally outperforms both competitive methods. We find es-pecially striking the competitive advantage with respect toFirst-Edge, however, this may be explained by comparingthe sample complexity results for both methods: First-Edgeneeds O(Nd logN) cascades to achieve a probability ofsuccess approaching 1 in a rate polynomial in the num-ber of cascades while our method needs O(d3 logN) toachieve a probability of success approaching 1 in a rate ex-ponential in the number of cascades.