+ All Categories
Home > Documents > Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network...

Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network...

Date post: 19-May-2018
Category:
Upload: lykien
View: 220 times
Download: 3 times
Share this document with a friend
21
Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign IL 61801 [email protected] Pramod Viswanath Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign IL 61801 [email protected] ABSTRACT Bitcoin is a popular alternative to fiat money, widely used for its perceived anonymity properties. However, recent attacks on Bitcoin’s peer-to-peer (P2P) network demonstrated that its gossip-based flooding protocols, which are used to ensure global network consistency, may enable user deanonymization— the linkage of a user’s IP address with her pseudonym in the Bitcoin network. In 2015, the Bitcoin community re- sponded to these attacks by changing the network’s flood- ing mechanism to a different protocol, known as diffusion. However, no systematic justification was provided for the change, and it is unclear if diffusion actually improves the system’s anonymity. In this paper, we model the Bitcoin networking stack and analyze its anonymity properties, both pre- and post-2015. In doing so, we consider new adversar- ial models and spreading mechanisms that have not been previously studied in the source-finding literature. We the- oretically prove that Bitcoin’s networking protocols (both pre- and post-2015) offer poor anonymity properties on net- works with a regular-tree topology. We validate this claim in simulation on a 2015 snapshot of the real Bitcoin P2P network topology. Categories and Subject Descriptors G.2.2 [Graph Theory]: Network problems, Graph algo- rithms Keywords Cryptocurrencies, Bitcoin, Peer-to-Peer Networks, Privacy 1. INTRODUCTION The Bitcoin cryptocurrency has seen widespread adop- tion, due in part to its reputation as a privacy-preserving fi- nancial system [21, 26]. In practice, though, Bitcoin exhibits a number of serious privacy vulnerabilities [4, 23, 31, 32, 28]. Most of these vulnerabilities arise because of two key prop- erties: (1) Bitcoin associates each user with a pseudonym, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. and (2) pseudonyms can be linked to financial transactions by way of a public transaction ledger, called the blockchain [27]. This means that if an attacker is able to associate a pseudonym with its human user, the attacker may learn the user’s entire transaction history. Such leakage represents a massive privacy violation, and would be deemed unaccept- able in traditional banking systems. In practice, there are several ways to link a user to her Bit- coin pseudonym. The most commonly-studied methods an- alyze transaction patterns in the public blockchain, and link those patterns using side information [28, 31]. In this paper, we are interested in a lower-layer vulnerability: the network- ing stack. Like most cryptocurrencies, Bitcoin nodes com- municate over a P2P network [27]. The anonymity implica- tions of this P2P network have been largely ignored until re- cently, when researchers demonstrated empirical deanonymiza- tion attacks that exploit the P2P network’s management protocols [8, 19]. These findings are particularly trouble- some because the Bitcoin P2P stack is used in a number of other cryptocurrencies (or altcoins ), some of which are designed with provable anonymity guarantees in mind [3, 33]. Hence vulnerabilities in the Bitcoin P2P network may extend to a host of other cryptocurrencies as well. We are interested in one key aspect of the Bitcoin P2P network: the dissemination of transactions. Whenever a user (Alice) generates a transaction (i.e., she sends bitcoins to another user, Bob), she first creates a “transaction mes- sage” that contains her pseudonym, Bob’s pseudonym, and the transaction amount. Alice subsequently broadcasts this transaction message over the P2P network, which enables other users to validate her transaction and incorporate it into the global blockchain. The broadcast of transactions is critical to maintaining blockchain consistency. Broadcasting proceeds by flooding transactions along links in the P2P network. The actual flooding protocol has been the subject of some discussion, and is central to our paper. In particular, we are inter- ested in theoretically quantifying the anonymity properties of existing flooding protocols used by the Bitcoin networking stack. Anonymity in the Bitcoin P2P network. Transaction broadcasting opens a new avenue for deanonymization at- tacks. If an attacker can infer the IP address that initiated a transaction broadcast, then the attacker can also link the IP address to the associated user’s Bitcoin pseudonym. Since IP addresses can sometimes be linked to human identities (e.g., with the help of an ISP), this deanonymization vector is a powerful one. 1 arXiv:1703.08761v1 [cs.CR] 26 Mar 2017
Transcript
Page 1: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

Anonymity Properties of the Bitcoin P2P Network

Giulia FantiCoordinated Sciences Laboratory

University of Illinois, Urbana-ChampaignIL 61801

[email protected]

Pramod ViswanathCoordinated Sciences Laboratory

University of Illinois, Urbana-ChampaignIL 61801

[email protected]

ABSTRACTBitcoin is a popular alternative to fiat money, widely used forits perceived anonymity properties. However, recent attackson Bitcoin’s peer-to-peer (P2P) network demonstrated thatits gossip-based flooding protocols, which are used to ensureglobal network consistency, may enable user deanonymization—the linkage of a user’s IP address with her pseudonym inthe Bitcoin network. In 2015, the Bitcoin community re-sponded to these attacks by changing the network’s flood-ing mechanism to a different protocol, known as diffusion.However, no systematic justification was provided for thechange, and it is unclear if diffusion actually improves thesystem’s anonymity. In this paper, we model the Bitcoinnetworking stack and analyze its anonymity properties, bothpre- and post-2015. In doing so, we consider new adversar-ial models and spreading mechanisms that have not beenpreviously studied in the source-finding literature. We the-oretically prove that Bitcoin’s networking protocols (bothpre- and post-2015) offer poor anonymity properties on net-works with a regular-tree topology. We validate this claimin simulation on a 2015 snapshot of the real Bitcoin P2Pnetwork topology.

Categories and Subject DescriptorsG.2.2 [Graph Theory]: Network problems, Graph algo-rithms

KeywordsCryptocurrencies, Bitcoin, Peer-to-Peer Networks, Privacy

1. INTRODUCTIONThe Bitcoin cryptocurrency has seen widespread adop-

tion, due in part to its reputation as a privacy-preserving fi-nancial system [21, 26]. In practice, though, Bitcoin exhibitsa number of serious privacy vulnerabilities [4, 23, 31, 32, 28].Most of these vulnerabilities arise because of two key prop-erties: (1) Bitcoin associates each user with a pseudonym,

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.

and (2) pseudonyms can be linked to financial transactionsby way of a public transaction ledger, called the blockchain[27]. This means that if an attacker is able to associate apseudonym with its human user, the attacker may learn theuser’s entire transaction history. Such leakage represents amassive privacy violation, and would be deemed unaccept-able in traditional banking systems.

In practice, there are several ways to link a user to her Bit-coin pseudonym. The most commonly-studied methods an-alyze transaction patterns in the public blockchain, and linkthose patterns using side information [28, 31]. In this paper,we are interested in a lower-layer vulnerability: the network-ing stack. Like most cryptocurrencies, Bitcoin nodes com-municate over a P2P network [27]. The anonymity implica-tions of this P2P network have been largely ignored until re-cently, when researchers demonstrated empirical deanonymiza-tion attacks that exploit the P2P network’s managementprotocols [8, 19]. These findings are particularly trouble-some because the Bitcoin P2P stack is used in a numberof other cryptocurrencies (or altcoins), some of which aredesigned with provable anonymity guarantees in mind [3,33]. Hence vulnerabilities in the Bitcoin P2P network mayextend to a host of other cryptocurrencies as well.

We are interested in one key aspect of the Bitcoin P2Pnetwork: the dissemination of transactions. Whenever auser (Alice) generates a transaction (i.e., she sends bitcoinsto another user, Bob), she first creates a “transaction mes-sage” that contains her pseudonym, Bob’s pseudonym, andthe transaction amount. Alice subsequently broadcasts thistransaction message over the P2P network, which enablesother users to validate her transaction and incorporate itinto the global blockchain.

The broadcast of transactions is critical to maintainingblockchain consistency. Broadcasting proceeds by floodingtransactions along links in the P2P network. The actualflooding protocol has been the subject of some discussion,and is central to our paper. In particular, we are inter-ested in theoretically quantifying the anonymity propertiesof existing flooding protocols used by the Bitcoin networkingstack.

Anonymity in the Bitcoin P2P network. Transactionbroadcasting opens a new avenue for deanonymization at-tacks. If an attacker can infer the IP address that initiated atransaction broadcast, then the attacker can also link the IPaddress to the associated user’s Bitcoin pseudonym. SinceIP addresses can sometimes be linked to human identities(e.g., with the help of an ISP), this deanonymization vectoris a powerful one.

1

arX

iv:1

703.

0876

1v1

[cs

.CR

] 2

6 M

ar 2

017

Page 2: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

In recent years, security researchers demonstrated pre-cisely such deanonymization attacks, which exploit Bitcoin’stransaction-flooding protocols. These attacks rely on a “su-pernode” that connects to active Bitcoin nodes and listensto the transaction traffic relayed by honest nodes [19, 8, 9].Using this technique, researchers were able to link Bitcoinusers’ pseudonyms to their IP addresses with an accuracy ofup to 30% [8].

In 2015, the Bitcoin community responded to these at-tacks by changing its flooding protocols from a gossip-styleprotocol known as trickle spreading to a diffusion spreadingprotocol that spreads content with independent exponentialdelays [1]. We define these protocols precisely in Section2. However, no systematic motivation was provided for thisshift. Indeed, it is unclear whether the change actually de-fends against the deanonymization attacks in [8, 19].

1.1 Problem and ContributionsOur goal is to analyze the anonymity properties of the Bit-

coin P2P network. The main point of our paper is to showthat the Bitcoin network has poor anonymity properties,and the community’s shift from trickle spreading (pre-2015)to diffusion spreading (post-2015) did not help the situation.The optimal (maximum-likelihood) source-identification al-gorithms change between protocols; identifying such algo-rithms and quantifying their performance is the primaryfocus of this work. We find that despite having differentmaximum-likelihood estimators, trickle and diffusion exhibitroughly the same, poor anonymity properties. Our specificcontributions are threefold:

(1) Modeling. We model the Bitcoin P2P network and an‘eavesdropper adversary’, whose capabilities reflect recentpractical attacks in [8, 19]. This task is complicated by thefact that most Bitcoin network protocols are not explicitlydocumented. Modeling the system therefore requires parsinga combination of documentation, papers, and code. Severalof the resulting models are new to the rumor source analysisliterature.

(2) Analysis of Trickle (Pre-2015). We analyze theprobability of deanonymization by an eavesdropper adver-sary under trickle propagation, which was used until 2015.Our analysis is conducted over a regular tree-structured net-work. Although the Bitcoin network topology is not a reg-ular tree, we will see in Section 2 that regular trees area reasonable first-order model. We consider suboptimal,graph-independent estimators (e.g., the first-timestamp es-timator), as well as maximum-likelihood estimators; bothare defined precisely in Section 2. Our analysis suggeststhat although the first-timestamp estimator performs poorlyon high-degree trees, maximum-likelihood estimators canachieve high probabilities of detection for trees of any de-gree d (Table 1).

(3) Analysis of Diffusion (Post-2015). We conduct asimilar analysis of diffusion spreading, which was adoptedin 2015 as a fix for the anonymity weaknesses observed un-der trickle propagation [8, 19]. Table 1 summarizes a subsetof our results, which characterize the probability of detec-tion asymptotically in tree degree d. We wish to highlightthe fact that trickle and diffusion exhibit similar anonymitybehavior. We revisit this table more carefully in Section 5,where we also empirically validate our findings on a snapshotof the real Bitcoin network from 2015.

Table 1: Summary of probability of detection results on reg-ular trees. Asymptotics are in the tree degree d. Notice thatresults for trickle and diffusion are similar.

Trickle(pre-2015)

Diffusion(post-2015)

First-Timestamp

log(d)d log(2)

+ o(log dd

)(Eq. 5)

log(d−1)(d−2)

(Thm. 4.1)Maximum-Likelihood

Θ(1) (Thm. 3.3) Θ(1) (Thm 4.2)

Paper Structure. We begin by modeling Bitcoin’s P2Pnetworking stack and the adversaries of interest in Section2. We then analyze the performance of trickle propagationin Section 3. In Section 4, we analyze the performance ofdiffusion. We compare these results side-by-side in Section5, which also includes empirical trials on a snapshot of thereal Bitcoin network. We discuss the relation between ourresults and prior work in Section 6. Section 7 concludesby discussing the practical implications of these results andpreliminary ideas on how to solve the problem.

2. MODEL AND PROBLEM STATEMENTTo characterize the anonymity of Bitcoin’s P2P network,

we need to model three key aspects of the system: the net-work topology, the spreading protocol, and the adversary’scapabilities.

2.1 Network ModelThe Bitcoin P2P network contains two classes of nodes:

servers and clients. Clients are nodes that do not accept in-coming TCP connections (e.g., nodes behind NAT), whereasservers do accept incoming connections. Clients and servershave different networking protocols and anonymity concerns.For instance, clients do not relay transactions. We focus inthis work on servers.

We model the P2P network of servers as a graph G(V,E),where V is the set of all server nodes and E is the setof edges, or connections, between them. In practice, eachserver node is represented by a (IP address, port) tuple.Currently, there are about 5,000 active Bitcoin servers, andthis number generally remains stable over the timescale of asingle transaction broadcast [10]. Each server is allowed toestablish up to eight outgoing connections to active Bitcoinnodes and maintain up to 125 total active connections [8,2]. For a connection between Alice and Bob, an outgoingconnection (from Alice’s perspective) is one that is initiatedby Alice, whereas an incoming connection is one initiatedby Bob. However, these TCP connections are bidirectionalonce established. Most of a server’s incoming connectionsare from Bitcoin clients, so they are irrelevant to our analy-sis. However, we will see momentarily that adversaries canuse the asymmetry between incoming and outgoing connec-tions to monitor the network for deanonymization attacks.

The resulting sparse random graph between servers can bemodeled approximately as a 16-regular graph; in practice,the average degree is closer to 8 due to nonhomogeneitiesacross nodes [24]. Critically, the graph is locally tree-like and(approximately) regular. For this reason, regular trees area natural class of graphs to study. In our theoretical analysis,we model G as a d-regular tree. We validate this choice

2

Page 3: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

by running simulations on a snapshot of the true Bitcoinnetwork [24] (Section 5).

2.2 Spreading ProtocolsRecall that every time a transaction (or a block) is com-

pleted, it is broadcast over the network. In this work, we an-alyze the spread of a single message originating from sourcenode v∗ ∈ V . Without loss of generality, we will label v∗ asnode ‘0’ whenever we are iterating over nodes.

The broadcasting protocol for messages has been underdiscussion in the Bitcoin community recently. Until 2015,the Bitcoin network used a gossip-like trickle broadcastingprotocol. However, in the wake of various anonymity attacks[8, 9, 19], the reference Bitcoin implementation changed itsnetworking stack to use a different broadcasting protocolknown as diffusion [1]. In this paper, we evaluate both pro-tocols and compare their performance.

Trickle spreading is a gossip-based flooding protocol.Each message source or relay randomly orders its neighborswho have not yet seen the message; we call these uninfectedneighbors. It then transmits the message to its neighborsaccording to the ordering, with a constant delay of 200 msbetween transmissions [8]. We model this spreading proto-col by assuming a discrete-time system; each source or relayrandomly orders its uninfected neighbors and transmits themessage to one neighbor per subsequent time step. We as-sume a node begins relaying a message in the first timestepafter it receives the message.

In diffusion spreading, each source or relay node trans-mits the message to each of its uninfected neighbors withan independent, exponential delay of rate λ. We assume acontinuous-time system, in which a node starts the expo-nential clocks as soon as it receives (or creates) a message.

For both protocols, we let Xv denote the timestamp atwhich honest node v ∈ V receives a given message. Notethat server nodes cannot be infected more than once. Weassume the message originates at time t = 0, so Xv∗ =X0 = 0. Moreover, we let Gt(Vt, Et) denote the infectedsubgraph of G at time t, or the subgraph of nodes who havereceived the message (but not necessarily reported it to theadversary) by time t.

2.3 Adversarial ModelWe consider an adversary whose goal is to link a message

with the (IP address, port) that originated it. In our setup,this translates to identifying the source node v∗ ∈ V .

To this end, we introduce an eavesdropper adversary,whose capabilities are modeled on the practical deanonymiza-tion attacks in [8, 19]. These attacks are cheap, scalable, andsimple, so they represent realistic threats to the network. Webegin by describing the attacks at a practical level, and thenextract models for the actual analysis.

The attacks in [8, 19] use a supernode that connects tomost of the servers in the Bitcoin network. The supernodecan make multiple connections to each honest server, witheach connection coming from a different (IP address, port).Hence, the honest server does not realize that the supern-ode’s connections are all from the same entity. The supern-ode can compromise arbitrarily many of a server’s unusedconnections, up to the hard limit of 125 total connections.We model this setup by assuming that the eavesdropper ad-versary makes a fixed number θ of connections to each server,where θ ≥ 1. We do not include these adversarial connec-

𝑣 ∗

Eavesdropper𝑑 = 2𝜃 = 3

Figure 1: The eavesdropper adversary establishes θ links(shown in red) to each server. Honest servers are connectedin a d-regular tree topology (edges shown in black).

tions in the original server graph G, so G remains a d-regulargraph. This setup is illustrated in Figure 1.

Once the supernode in [8, 19] is established, it simply lis-tens to all relayed messages on the network, without relayingor transmitting any content—hence the name ‘eavesdropperadversary.’ Over time, due to peer address propagation pro-tocols (which were not discussed here), the adversary learnsthe network structure between servers. Therefore, we as-sume that G(V,E) is known to the eavesdropper adversary.

The supernode in [8, 19] also observes the timestamps atwhich messages are relayed from each honest server. Sincethe adversary maintains multiple active connections to eachserver, it receives the message multiple times from eachserver. For ease of analysis, we assume that the eavesdropperadversary only stores the first such timestamp. We also as-sume that the adversary observes all timestamps relative totime t = 0, i.e., it knows when the message started spread-ing. Given this, we let τv denote the time at which theadversary first observes the message from node v ∈ V . Welet τ denote the set of all observed timestamps.

Relation to existing adversarial models. We wish tobriefly highlight the difference between the eavesdropper ad-versary and two commonly-studied adversarial models. Thesnapshot adversary is the most common adversary in thisspace [36, 35, 37, 14, 20]. A snapshot adversary observes theset of infected nodes at a single time T ; in our notation, theadversary learns the set {v ∈ V : Xv ≤ T} (no timestamps),along with graph G. The eavesdropper adversary differs inthat it eventually observes a noisy timestamp τv for everynode, regardless of when the node is infected.

Another common adversarial model is the spy-based ad-versary, which observes exact timestamps for a corruptedset of nodes that does not include the source [29, 40]. In ournotation, for a set of spies S ⊆ V , the spy-based adversaryobserves {(s,Xs) : s ∈ S}. This is different from the eaves-dropper adversary because the eavesdropper only observesdelayed timestamps, and it does so for all nodes, includingthe source. Precise analysis of the spy-based adversary hasnot appeared in the literature.

Neither of these prior adversarial models adequately cap-tures the recent supernode-based deanonymization attackson the Bitcoin network [8, 19]. Indeed, we shall see that theeavesdropper adversary requires different analytical tech-niques from traditional adversarial models. On the otherhand, some of the techniques we will use to analyze theeavesdropper adversary can be applied to traditional adver-saries (Sec. 5).

Source Estimation. The adversary’s goal is as follows:given the observed noisy timestamps τ (up to estimation

3

Page 4: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

time t) and the server graph G, find an estimator M(τ , G)that correctly identifies the true source. Our metric of suc-cess for the adversary is probability of detection. Givenan estimator M, the adversary’s probability of detection isP(M(τ , G) = v∗). The probability is taken over the randomspreading realization (captured by τ ).

In [8, 19], the adversary’s estimator is a variant of the so-called first-timestamp estimator. The first-timestampestimator MFT(τ , G) outputs the first node (prior to estima-tion time t) to report the message to the adversary:

MFT(τ , G) = arg minv∈Vt

τv.

The first-timestamp estimator is popular because of its sim-plicity: it requires no knowledge of the graph, and it is com-putationally easy to implement. Despite its simplicity, thefirst-timestamp estimator achieves high accuracy rates inpractice [8, 19]. We begin by analyzing this estimator forboth trickle and diffusion propagation.

In principle, the adversary could implement more compli-cated estimators that utilize the underlying graph structure.We are particularly interested in the maximum-likelihood(ML) estimator, which maximizes the adversary’s proba-bility of detection:

MML(τ , G) = arg maxv∈V

P(τ |G, v∗ = v).

The ML estimator depends on the time of estimation t tothe extent that τ only contains timestamps up to time t.Unlike the first-timestamp estimator, the ML estimator dif-fers across spreading protocols, depends on the graph, andmay be computationally intractable. We nonetheless boundits performance for both trickle and diffusion spreading overregular trees.

2.4 Problem StatementOur primary goal is to understand whether the Bitcoin

community’s move from trickle spreading to diffusion actu-ally improved the system’s anonymity guarantees. As such,our main problem is to characterize the maximum-likelihood(ML) probability of detection of the eavesdropper adversaryfor both trickle and diffusion processes on d-regular trees, asa function of degree d, number of corrupted connections θ,and detection time t. We meet this goal by computing lowerbounds derived from the analysis of suboptimal estimators(e.g., first-timestamp estimator and centrality-based estima-tors), and upper bounds derived from fundamental limits ondetection.

3. ANALYSIS OF TRICKLE (PRE-2015)We begin by analyzing the probability of detection of

trickle spreading. We first consider the first-timestamp es-timator, followed by the ML estimator.

3.1 First-Timestamp EstimatorThe analysis of trickle propagation is complicated by its

combinatorial, time-dependent nature. As such, we begin bylower-bounding the first-timestamp estimator’s probabilityof detection. We do so by computing the probability thatthe true source reports the message strictly before any othernode. Let

τm , min(τ1, τ2, . . .)

denote the minimum observed timestamp among nodes thatare not the source. Then we compute P(τ0 < τm), i.e., theprobability that the true source reports the message to theadversary strictly before any of the other nodes. This event(which causes the source to be detected with probability1) does not include cases where the true source is one ofk nodes (k > 1) that report the message to the adversarysimultaneously, and before any other node in the system.Since P(τ0 < τm) does not account for such ‘simultaneousreporting’ events, it is a lower bound. Nonetheless, for larged, the ‘simultaneous reporting’ event is rare, so our lowerbound is close to the empirical probability of detection ofthe first-timestamp estimator.

Theorem 3.1. Consider a message that propagates ac-cording to trickle spreading over a d-regular tree of honestservers, where each node additionally has θ connections toan eavesdropping adversary. The first-timestamp estima-tor’s probability of detection at time t =∞ satisfies

P(MFT(τ , G) = v∗) ≥ θ

d log 2

[Ei(2d log ρ)− Ei (log ρ)

](1)

where ρ = d−1d−1+θ

, Ei(·) denotes the exponential integral, de-fined as

Ei(x) , −∫ ∞−x

e−tdt

t,

and all logarithms are natural logs.

(Proof in Section B.1)We prove this bound by conditioning on the time at which

the source reports to the adversary, and computing the con-ditional probability that all other nodes report later. Theproof then becomes a combinatorial counting problem.

Implications. We can approximate the asymptotic behav-ior of equation (1) for large d by using the exponential inte-gral’s Taylor expansion. First, we note that when d is large,Ei(2d log ρ) ≈ 0, so we have

θ

d log 2

[Ei(2d log ρ)− Ei (log ρ)

]≈ θ

d log 2

(−γ − log |log ρ| −

∞∑ν=1

(log ρ)ν

ν · ν!

)(2)

≈ θ

d log 2

(−γ − log log

(1 +

θ

d

)+ log(1 +

θ

d)

−log2(1 + θ

d)

4+ . . .

)(3)

≈ θ

d log 2

(−γ − log

θ

d+θ

d− θ2

4d2+ . . .

)(4)

where γ ≈ 0.577 is the Euler-Mascheroni constant [39], and(2) comes from substituting the exponential integral by itsTaylor expansion for real arguments [6]. Line (3) holds be-cause as d → ∞, ρ ≈ 1

1+θ/d, and (4) holds because as

d→∞, log(1 + θd) ≈ θ

d.

In particular, for the special case of θ = 1 where the ad-versary establishes only one connection per server, line (4)simplifies to

P(MFT(τ , G)) ≈ log d

d · log 2+ o

(log d

d

). (5)

4

Page 5: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

2 4 6 8 10

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65Theoretical lower boundlog(d) / (d log(2))Simulation

Pro

babilit

yof

Det

ecti

on

Tree degree, d

Figure 2: First-timestamp estimator accuracy on d-regulartrees when the adversary has one link per server (θ = 1).

This suggests that the first-timestamp estimator has a prob-ability of detection that decays to zero asymptotically aslog(d)/d. Intuitively, the probability of detection should de-cay to zero, because the higher the degree of the tree, thehigher the likelihood that a node other than the source re-ports to the adversary before the source does. Nonetheless,(5) is only a lower bound on the first-timestamp’s proba-bility of detection, so we wish to understand how tight thebound is.

Simulation Results. To evaluate the lower bound in The-orem 3.1 and the approximation in (5), we simulate the first-timestamp estimator on regular trees. Figure 2 illustratesthe simulation results for θ = 1 compared to the approxi-mation in (5). Each data point is averaged over 5,000 trials.In practice, the lower bound appears to be tight, especiallyas d grows.

Figure 2 and approximation (5) suggest a natural solutionto the Bitcoin network’s anonymity problems: increase thedegree of each node to reduce the adversary’s probability ofdetection. However, we shall see in the next section thatstronger estimators (e.g., the ML estimator) may achievehigh probabilities of detection, even for large d. Therefore,the trickle protocol is unlikely to adequately protect users’anonymity.

3.2 Maximum-Likelihood EstimatorIn this section, we identify an ML source estimator for

finite detection time t, which uses the graph structure andtimestamps to infer the source. We analyze its limiting prob-ability of detection as t → ∞, and show that it achieves aprobability of detection lower-bounded by 1/2, regardless ofthe degree of the tree. This highlights the weakness of tricklepropagation when the adversary knows the graph. We beginwith a discussion of the ML estimator.

As the message spreads, it reaches some nodes before oth-ers. At any time t, if one knew the ground truth timestamps(i.e., the Xv’s), one could arrange the nodes of the infectedsubgraph Gt in the order they received the message. We callsuch an arrangement an ordering of the nodes. Since tricklepropagation is a discrete-time system, multiple nodes mayreceive the message simultaneously, in which case they arelumped together in the ordering. Of course, the true or-dering is not observed by the adversary, but the observed

timestamps (i.e., τ ) restrict the set of possible orderings.A feasible ordering is an ordering that respects the rules oftrickle propagation over graph G, as well as the observedtimestamps τ . In this subsection only, we will abuse nota-tion by using τ to refer to all timestamps observed by theadversary, not just the first timestamp from each server. Soif the adversary has θ connections to each server, τ wouldinclude θ timestamps per honest server.

We propose an estimator called timestamp rumor cen-trality, which counts the number of feasible orderings orig-inating from each candidate source. The candidate with themost feasible orderings is chosen as the estimator output.This estimator is similar to rumor centrality, an estimatordevised for snapshot adversaries in [34]. However, the pres-ence of timestamps and the lack of knowledge of the infectedsubgraph complicates matters. We first motivate timestamprumor centrality, then show that it is the ML source estima-tor for trickle spreading under an eavesdropper adversary.

Proposition 3.2. Consider a trickle process over a d-regular graph, where each node has θ connections to theeavesdropper adversary. Any feasible orderings o1 and o2with respect to observed timestamps τ and graph G have thesame likelihood.

(Proof in Section B.2)This claim can be proved with an inductive argument,

which shows that at any given time, the number of nodeswith a given uninfected degree, or number of uninfected neigh-bors, is deterministic—i.e., it does not depend on the under-lying ordering. Moreover, the likelihood of a given orderingis strictly a function of the nodes’ uninfected degrees at eachtime step, so all feasible orderings have the same likelihood.

Proposition 3.2 implies that at any fixed time, the like-lihood of observing τ given a candidate source is propor-tional to the number of feasible orderings originating fromthat candidate source. Therefore, an ML estimator, whichwe call timestamp rumor centrality, counts the number offeasible orderings at finite estimation time t.

Timestamp rumor centrality is a message-passing algo-rithm that proceeds as follows: for each candidate source,recursively determine the set of feasible times when eachnode could have been infected, given the observed times-tamps. This is achieved by passing a set of “feasible times ofreceipt”from the candidate source to the leaves of the largestfeasible subtree rooted at the candidate source. In each step,nodes prune any times of receipt that conflict with their ob-served timestamps. Next, given each node’s set of feasiblereceipt times, they count the number of feasible orderingsthat obey the rules of trickle propagation. This is achievedby passing sets of partial orderings from the leaves back tothe candidate source, and pruning out infeasible orderings.

In practice, we do not pass the entire partial ordering; wecan instead store the number of distinct partial orderingsfor the subtree rooted at each child, indexed by each fea-sible time of receipt for the child (there are O(d) feasibletimes of receipt). This reduces the overall computationalcomplexity to O((2d)d|V |): each node passes a message ofsize O(d) to its parent, and each parent takes a Cartesianproduct of its children’s messages. The whole procedureis run for each candidate source, of which there are O(2d),since the true source has a timestamp of at most d+ 1. Dueto the additional notation needed to describe timestamp ru-

5

Page 6: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

−2 1−1 2

𝜏/∗   = 2

𝜏1   = 2

𝜏21   = 4

−3

𝑋/∗   = 0 𝑋1   = 1 𝑋7   = 3𝑋21   = 3𝑋27   = 4

3

𝑋8   = 4

𝑣∗

Figure 3: Example of ball centrality on a line with one linkto the adversary per server (these links are not shown). Theestimator is run at time t = 4.

mor centrality in detail, we defer a full-fledged pseudocodedescription to Appendix B (Figure 9).

In [37], precise analysis of standard rumor centrality waspossible because rumor centrality can be reduced to a simplecounting problem. Such an analysis is more challenging fortimestamp rumor centrality, because timestamps prevent usfrom using the same counting argument. However, we iden-tify a suboptimal, simplified version of timestamp rumorcentrality that approaches optimal probabilities of detectionas t grows. We call this estimator ball centrality.

Unlike timestamp rumor centrality, ball centrality simplychecks whether a candidate source v could have generatedeach of the observed timestamps, independently. We ex-plain this in the context of an example. Figure 3 contains asample spread on a line graph, where the adversary has oneconnection per server (not shown). Therefore, d = 2 andθ = 1. The ground truth infection time is written as Xvbelow each node, and the observed timestamps are writtenabove the node. In this figure, the estimator is run at timet = 4, so the adversary only sees three timestamps. For eachobserved timestamp τv, the estimator creates a ball of radiusτv − 1, centered at v. For example, in our figure, the greennode (node 1) has τ1 = 2. Therefore, the adversary wouldmake a ball of radius 1 centered at node 1; this ball is de-picted by the green bubble in our figure. The ball representsthe set of nodes that are close enough to node 1 to feasiblyreport to the adversary from node 1 at time τ1 = 2. Afterconstructing an analogous ball for every observed timestampin τ , the protocol outputs a source selected uniformly fromthe intersection of these balls. In our example, there areexactly two nodes in this intersection. We outline ball cen-trality precisely in Protocol 1.

Although ball centrality is not ML for a fixed time t, thefollowing theorem lower bounds the ML probability of detec-tion by analyzing ball centrality and showing that its prob-ability of detection approaches a fundamental upper boundexponentially fast in detection time t.

Protocol 1 Ball Centrality. Returns a source estimatewhose location is consistent with timestamps τ on tree G.h(v, w) denotes the hop distance between v and w.

Input: Timestamps τ , graph G(V,E)Output: Source estimate v ∈ V1: W ← V2: for v ∈ V do . Find the intersection of feasible balls3: W ←W ∩ {w ∈ V : h(w, v) ≤ τw − 1}4: v ∼ Unif(W )

1 2 3 4 5 6 7 8 9 100.54

0.56

0.58

0.6

0.62

0.64

0.66

0.68Theoretical upper boundTimestamp rumor centrality

Pro

babilit

yof

Det

ecti

on

Tree degree, d

Figure 4: Timestamp rumor centrality (ML estimator) ac-curacy on d-regular trees when the adversary has one linkper server (θ = 1). The estimator is run at time t = d+ 1.

Theorem 3.3. Consider a trickle spreading process overa d-regular graph of honest servers. In addition, each serverhas θ independent connections to an eavesdropper adversary.The ML probability of detection at time t satisfies the follow-ing two expressions:

(1) P(MML(τ , G) = v∗) ≤ 1− d2(θ+d)

(6)

(2) P(MML(τ , G) = v∗) ≥ 1− d2(θ+d)

−(

dθ+d

)t(7)

(Proof in Section B.3)To prove the upper bound, the key idea is as follows: in

the first time step, if a source infects an honest neighborinstead of the adversary, then the true source and the honestneighbor become indistinguishable to the adversary. This isbecause the spreading processes from each node after thatfirst time step become identically distributed. This givesan upper bound on the probability of detection. For thelower bound, the key idea is that if we wait long enough,the ball estimator will eventually return an intersection ofballs containing at most two nodes: the source and the firsthonest neighbor to whom it passed the message. We cancharacterize the probability of this happening as a functionof time, which lower bounds the ML probability of detection.

We make a few additional remarks about Theorem 3.3:

(1) The right-hand side of equation (6) is always greaterthan 1

2, regardless of θ. As such, increasing the degree of

the graph would not significantly reduce the probability ofdetection—the adversary can still identify the source withprobability at least 1

2, given enough time.

(2) The ML probability of detection approaches its upperbound exponentially fast in time t. This suggests that theadversary can, in practice, achieve high probabilities of de-tection at small times t. For example, Figure 4 shows theprobability of detection of timestamp rumor centrality, asa function of d, for θ = 1. The estimator is run at timet = d + 1. Even at such small timestamps, the probabilityof detection is empirically close to the upper bound in (6).

These results highlight a simple but important point: es-timators that exploit graph structure can significantly in-crease the accuracy of an estimator, up to order-level gains.

6

Page 7: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

4. ANALYSIS OF DIFFUSION (POST-2015)Having studied the anonymity properties of trickle spread-

ing, we now move to diffusion, a spreading mechanism adoptedby the Bitcoin community in 2015 in response to the attacksdemonstrated on the trickle spreading mechanism [1]. Wewish to understand whether diffusion has better anonymityproperties than trickle propagation. However, the main chal-lenge is that even on simple graphs like a line, computingthe exact likelihood of a candidate diffusion source is in-tractable as the set of infected nodes grows. This is becausea node’s likelihood of being the source depends on the unob-served times at which each node is infected—a state spacethat grows exponentially in the graph size. As such, it ischallenging to characterize the ML probability of detectionin closed form. Instead, we will compute lower bounds onthe ML probability of detection by analyzing two achiev-able estimation schemes: the first-timestamp estimator anda heuristic estimator of our own creation, which we call thereporting center estimator. We will see that these two es-timators are complementary, in the sense that they detectthe source well for different regimes of degree d. Moreover,they give detection probabilities of the same order as tricklespreading.

4.1 First-timestamp estimatorAlthough the first-timestamp estimator does not use knowl-

edge of the underlying graph, its performance depends heav-ily on the underlying graph structure. The following theo-rem exactly characterizes its eventual probability of detec-tion on a regular tree.

Theorem 4.1. Consider a diffusion process of rate λ = 1over a d-regular tree, d > 2. Suppose an adversary observeseach node’s infection time with an independent, exponentialdelay of rate λ2 = θ, θ ≥ 1. Then the following expressiondescribes the probability of detection for the first-timestampestimator at time t =∞:

P(MFT(τ , G) = v∗) =θ

d− 2log

(d+ θ − 2

θ

)(8)

(Proof in Section B.4)The proof follows from writing out P(τ0 < τm), condi-

tioned on all the unobserved infection times. This expressioncan be written recursively, which admits a nonlinear differ-ential equation that can be solved exactly. The expressionhighlights a few points:

(1) For a fixed degree d, the probability of detection isstrictly positive as t → ∞. This observation is straight-forward in our case (i.e., under the eavesdropper adversary),but under different adversarial models (e.g., snapshot adver-saries) it is not trivial to see that the probability of detectionis positive as t → ∞. Indeed, several papers are dedicatedto making exactly that point [36, 37].

(2) There is a law of diminishing returns with respect to θ:for a fixed degree d, as θ increases, the rate of growth of(8) decreases. Since θ represents the number of adversarialconnections per honest node, the adversary reaps the largestgains from the first few connections it establishes per node.

(3) When θ = 1, i.e., the adversary has only one connectionper node, the probability of detection approaches log(d)/dasymptotically in d. Note that this quantity tends to 0 asd→∞, and it is order-equal to the probability of detection

of the first-timestamp adversary on the trickle protocol whenθ = 1 (cf. equation (5)).

Theorem 4.1 suggests that the Bitcoin community’s transi-tion from trickle spreading to diffusion spreading does notprovide any order-level anonymity gains (asymptotically inthe degree of the graph), at least for the first-timestamp ad-versary. Next, we would like to see whether the same is truefor estimators that use the graph structure.

4.2 Centrality-Based EstimatorsWe compute a different lower bound on the ML probabil-

ity of detection by analyzing a centrality-based estimator.Unlike the first-timestamp estimator, this reporting central-ity estimator uses the structure of the infected subgraphby selecting a candidate source that is close to the center(on the graph) of the observed timestamps. However, itdoes not explicitly use the observed timestamps. Also unlikethe first-timestamp estimator, this centrality-based estima-tor improves as the degree d of the underlying tree increases.Indeed, it has a strictly positive probability of detection asd → ∞, implying that the eavesdropper adversary has anML probability of detection that scales as Θ(1) in d. Westart by presenting the reporting centrality estimator, afterwhich we analyze its probability of detection.Reporting centrality estimator. At a high level, the re-porting centrality estimator works as follows: for each can-didate source v, the estimator counts the number of nodesthat have reported to the adversary from each of the node v’sadjacent subtrees. It picks a candidate source for which thenumber of reporting nodes is approximately equal in eachsubtree.

To make this precise, we introduce some notation. First,suppose the infected subtree Gt is rooted at w; we useTwv to denote the subtree of Gt that contains v and all ofv’s descendants, with respect to root node w. Consider arandom variable Yv(t), which is 1 if node v ∈ V has re-ported to the adversary by time t, and 0 otherwise. Welet YTwv (t) =

∑u∈Twv

Yu(t) denote the number of nodes in

Twv that have reported to the adversary by time t. We useY (t) =

∑v∈Vt Yv(t) to denote the total number of reporting

nodes in Gt at time t. Similarly, we use NTwv (t) to denotethe number of infected nodes in Twv (so NTwv (t) ≥ YTwv (t)),and we let N(t) denote the total number of infected nodesat time t (N(t) ≥ Y (t)).

For each candidate source v, we consider its d neighbors,which comprise the set N (v). We define a node v’s reportingcentrality at time t—denoted Rv(t)—as follows:

Rv(t) =

{1 if maxu∈N (v) YTvu (t) < Y (t)

2

0 otherwise.(9)

That is, a node’s reporting centrality is 1 if and only if eachof its adjacent subtrees has strictly fewer than Y (t)/2 re-porting nodes. We say a node is a reporting center iff it hasa reporting centrality of 1. The estimator outputs a nodev chosen uniformly from the set of reporting centers. Forexample in Figure 5, there is only one reporting center, v∗.

Notice that reporting centrality does not use the adver-sary’s observed timestamps—it only counts the number ofreporting nodes in each of a node’s adjacent subtrees. Thisestimator is inspired by rumor centrality [36], an ML esti-mator for the source of a diffusion process under a snapshotadversary. Recall that a snapshot adversary sees the infected

7

Page 8: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

𝑣∗

𝑤

Rv∗(t) = 1

Rw(t) = 0

Y (t) = 5

N(t) = 7

Figure 5: Reporting centrality example. Yellow nodes areinfected; a red outline means the node has reported. In thisexample, v∗ has Rv∗(t) = 1 because each of its adjacentsubtrees has fewer than Y (t)/2 = 2.5 reporting nodes.

subgraph Gt at time t, but it does not learn any timestampinformation.

Rumor centrality exhibits a key property on trees: a nodev is a rumor center of a tree-structured infected subgraphGt if and only if each of the d adjacent subtrees adjacenthas no more than N(t)/2 nodes in it [36]:

maxu∈N (v)

NTvu (t) ≤ N(t)

2. (10)

Moreover, there exists at least one and at most two rumorcenters in a tree, and the true source has a strictly positiveprobability of being a rumor center on regular trees andgeometric random trees [36]. In our case, we cannot userumor centrality directly because the true infected nodes arenot all observed at any point in time t. We construct thereporting centrality estimator by applying a condition like(10) to the reporting nodes, rather than the infected nodes.Analysis. We show that for trees with high degree d, re-porting centrality has a strictly higher (in an order sense)probability of detection than the first-timestamp estimator;its probability of detection is strictly positive as d→∞.

Theorem 4.2. Consider a diffusion process of rate λ = 1over a d-regular tree. Suppose this process is observed byan eavesdropper adversary, which sees each node’s times-tamp with an independent exponential delay of rate λ2 = θ,θ ≥ 1. Then the reporting centrality estimator has a (time-dependent) probability of detection P(MRC(τ , G) = v∗) thatsatisfies

lim inft→∞

P(MRC(τ , G) = v∗) ≥ Cd > 0. (11)

where

Cd = 1− d(

1− I1/2(

1

d− 2, 1 +

1

d− 2

))is a constant that depends only on degree d, and I1/2(a, b) isthe regularized incomplete Beta function, i.e., the probabilitya Beta random variable with parameters a and b takes avalue in [0, 1

2).

Proof Sketch: (Full proof in Section B.5) The proof beginsby conditioning on the event that the true source is a report-ing center. We demonstrate that this occurs with a prob-ability that is lower-bounded by a constant. Conditioned

on this event, the probability of detection is 1, since therecan exist at most one reporting center. Combining all theseobservations gives the lower bound in the claim.

The key step in this proof is demonstrating that the sourceis a reporting center with probability lower-bounded by aconstant. To show this, we relate two Polya urn processes:one that represents the diffusion process over the regulartree of honest nodes, and one that describes the full spread-ing process, which includes both diffusion over the regulartree and random reporting to the adversary. The first urncan be posed as a classic Polya urn [12], which has beenstudied in the context of diffusion [37, 18]. However, thesecond urn can be described by an unbalanced generalizedPolya urn (GPU) with negative coefficients—a class of urnsthat does not typically appear in the study of diffusion (tothe best of our knowledge). In GPUs, upon drawing a ballof a given color, one can add balls of a different color or evenremove balls; ‘unbalanced’ means that the number of ballsadded in each timestep is not necessarily equal. Prior resultsby Athreya and Ney [5] and later Janson [16] characterizethe limiting distribution of unbalanced GPUs with negativecoefficients, which allows us to relate the reporting central-ity of our process to the rumor centrality of the underlying(unobserved) diffusion process.

Notice that the constant Cd in Theorem 4.2 does not de-pend on θ—this is because the reporting centrality estimatormakes no use of timestamp information, so the noisy delaysin the observed timestamps do not affect the estimator’sasymptotic behavior. The delay does affect the convergencerate of the probability of detection. Characterizing this de-pendency is theoretically interesting, but beyond the scopeof this paper.

Simulation results. To evaluate the tightness of the lowerbound in Theorem 4.2, we simulate reporting centrality ondiffusion processes over regular trees. Figure 6 illustratesthe empirical performance of reporting centrality averagedover 4,000 trials, compared to the theoretical lower boundon the liminf. The estimator is run at time t = d + 2. Oursimulations are run up to degree d = 5 due to computationalconstraints, since the infected subgraph grows exponentiallyin the degree of the tree. We observe that by degree d = 5,reporting centrality reaches the theoretical lower bound onthe limiting detection probability.

4.3 First-Timestamp vs. CentralityNeither of the lower bounds from the first-timestamp or

reporting centrality estimators strictly outperforms the other.The first-timestamp estimator performs better on graphswith low degree d, whereas reporting centrality performsbetter in the high-d regime. By taking the maximum ofthese two estimators, we obtain a lower bound on the MLprobability of detection for diffusion processes across the fullrange of degrees. Fully characterizing this probability of de-tection remains an interesting open problem for future work.

Figure 6 compares the two estimators both in simulationand theoretically as a function of degree d. We observe thatreporting centrality outstrips first-timestamp estimation fortrees of degree 9 and higher; since our theoretical result isonly a lower bound on the performance of reporting cen-trality, the transition may actually occur at even smallerd. Empirically, the true Bitcoin graph is approximately 8-regular [24], a regime in which we expect reporting central-

8

Page 9: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1First-timestamp, theoreticalFirst-timestamp, simulatedReporting centrality, theoreticalReporting centrality, simulated

Degree, d

Pro

babilit

yof

Det

ecti

on

Figure 6: Comparison of first-timestamp and reporting cen-trality estimators on diffusion over regular trees, theoreti-cally and simulated. Here θ = 1

dand t = d+ 2.

ity to perform similarly to the first-timestamp estimator.Since the practical attacks in [8, 19] use the suboptimalfirst-timestamp estimator, they may underestimate the MLprobability of detection.

5. DISCUSSIONIn this section, we provide a comprehensive comparison

of trickle and diffusion, both theoretically and in simula-tion. We also highlight some mathematical subtleties in ouranalyses, which give rise to a number of open problems ofindependent interest.

5.1 Comparison: Trickle vs. DiffusionFor direct comparison, Table 2 contains our theoretical

results side-by-side, for the special case of θ = 1 and forgeneral θ. As we have previously stated, the performanceof trickle and diffusion are similar, particularly when θ =1. Although the maximum-likelihood results are difficult tocompare at first glance, the point is that they both approacha positive constant as d, t→∞; for trickle propagation, thatconstant is 1

2, whereas for diffusion, it is approximately 0.307

(Corollary 1 from [37]).Up to now, we have primarily considered the performance

of trickle and diffusion as a function of degree d. However,in practice, the underlying Bitcoin graph is fixed; the onlyvariable quantity is the adversary’s resources, which are cap-tured by θ, the number of corrupted connections per honestnode. However, it does not make sense to study the asymp-totic probability of detection as θ → 0, since θ ≥ 1; on theother hand, as θ →∞, the probability of detection tends to1, regardless of spreading protcol. We therefore wish to nu-merically understand, for a fixed d, how diffusion and tricklecompare as a function of θ. Figure 7 compares analyticalexpressions and simulation results for the first-timestampestimator on 4-regular trees.

We use the first-timestamp estimator for a few reasons:first, we lack an efficient ML estimator for diffusion. Second,the transition from trickle to diffusion was motivated by thepractical attacks in [8, 19], which used a version of the first-timestamp estimator. Third, even under the first-timestampestimator, the probability of detection is unacceptably high.

0 5 10 15 20

0.4

0.5

0.6

0.7

0.8

0.9

Trickle, Theoretical (Lower bound)Trickle, Simulated (Lower bound)Trickle, Simulated (Exact)Diffusion, TheoreticalDiffusion, Simulated

Number of corrupted connections, θ

Pro

babilit

yof

Det

ecti

on

Figure 7: Comparison of trickle and diffusion under the first-timestamp estimator on 4-regular trees.

Note that our theoretical results suggest a higher prob-ability of detection for diffusion than for trickle. This isan artifact of our lower bound on the trickle probabilityof detection. Recall that our lower bound computes theprobability that v∗ reports strictly before any other node(i.e., simultaneous events are discarded). Since Figure 7 isplotted for a small degree d = 4, simultaneous reportingoccurs frequently, so the lower bound is loose. In simu-lation, we find that trickle and diffusion actually exhibitnearly identical performance, which agrees with the theo-retical probability of detection for diffusion. Moreover, weverified our trickle lower bound in simulation by discardingrealizations where multiple nodes report simultaneously atthe first timestamp; these simulation results (burgundy linein Figure 7) align closely with our theoretical prediction.We also find that as d increases, the gap between our tricklelower bound and the simulated probability of detection fortrickle decreases. Meanwhile, the similarity between diffu-sion and trickle (in simulation) persists even for high d—atleast on regular trees.

Real Bitcoin graph. The real Bitcoin network is not aregular tree. To validate our decision to analyze regulartrees, we simulate trickle and diffusion propagation over asnapshot of the real Bitcoin network from 2015 [24]. Figure8 compares these results as a function of θ, for the first-timestamp estimator. Since the Bitcoin graph is not tree-structured, we lack ML estimators for both diffusion andtrickle; hence, the first-timestamp estimator is a reasonablechoice. Unless specified otherwise, the theoretical curvesare calculated for a regular tree with d = 8, since this is themean degree of our dataset.

We first observe that the simulated performance of dif-fusion is close to our theoretical prediction. This occursbecause with high probability, the first-timestamp estima-tor uses only on a local neighborhood to estimate v∗. Sincethe Bitcoin graph can be approximated by a sparse, random,regular graph, the graph is locally tree-like with high prob-ability, so our theoretical analysis of regular trees applies.However, our trickle lower bound remains loose. This ispartially due to simultaneous reporting events, but the maincontributing factor seems to be the irregularity of the under-

9

Page 10: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

Table 2: Summary of probability of detection results on a network of honest servers in a d-regular tree topology. The adversaryhas θ connections to each honest server.

Trickle Diffusion

First-Timestamp

All θ θd log 2

[Ei(2d log ρ)− Ei (log ρ)

](Thm 3.1) θ

d−2log(d+θ−2θ

)(Thm. 4.1)

θ = 1 log(d)d log(2)

+ o(log dd

)(Eq. 5) log(d−1)

(d−2)(Thm. 4.1)

Maximum-Likelihood

All θ 1− d2(θ+d)

(Thm 3.3) 1− d(

1− I1/2(

1d−2

, 1 + 1d−2

))θ = 1 1− d

2(d+1)(Thm. 3.3) (Thm. 4.2)

0 5 10 15 20

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Trickle, Theoretical lower boundTrickle, SimulatedTrickle, Theoretical lower bound (d=2)Diffusion, TheoreticalDiffusion, Simulation

Number of corrupted connections, θ

Pro

babilit

yof

Det

ecti

on

Figure 8: Comparison of trickle and diffusion spreading un-der the first-timestamp estimator, simulated on a 2015 snap-shot of the real Bitcoin network [24].

lying graph. It appears that trickle responds more acutelyto graph irregularities than diffusion, presumably due to itsmore structured nature, which we exploited heavily in ouranalysis. Indeed, we find that the simulated performance oftrickle is close to the theoretical predictions for regular treesof degree d = 2 (black line in Figure 8); d = 2 happens to bethe mode of the dataset degree distribution. Understandingthis effect more carefully is an interesting question for futurework.

Second, we observe that empirically, the probability ofdetection for diffusion is indeed lower than that of tricklespreading. This suggests that the Bitcoin developers’ in-tuition was correct—diffusion has slightly better anonymityproperties than trickle. Still, the difference is small. Noticethat in Figure 8, diffusion and trickle become increasinglysimilar as the number of corrupted connections increases. Inpractical attacks [8], the eavesdropper was able to establishas many as 50 connections to some nodes. In that regime,Figure 8 suggests that trickle and diffusion would have verysimilar (high) probabilities of detection, even for the sub-optimal first-timestamp estimator. Thus, even in a numericsense, diffusion and trickle are not substantially differentunder practical adversarial operating conditions.

In summary, we find that trickle and diffusion have proba-bilities of detection that are similar, both in an asymptotic-order sense and in a numeric sense, as seen in both simula-tion and theoretical analysis. We have evaluated this on thecanonical class of d-regular trees and, through simulation,on a real Bitcoin graph topology.

5.2 Open ProblemsSeveral of our analyses suggest open problems that are

interesting in their own right. For instance, in our analysisof diffusion, the exact ML probability of detection remainsunknown. Moreover, it is unclear if an efficient (polynomial-time in the number of nodes) ML estimator exists. Even ouranalysis of reporting centrality—a suboptimal estimator—gives only a lower bound on the limiting probability of de-tection, and its convergence rate as a function of θ and d isnot known. These open problems are all directly related tothe goals of this paper.

Relation to other adversarial models. Another sub-class of open problems relates to connecting the eavesdrop-per adversary to more well-known, canonical adversarial mod-els. That is, the eavesdropper adversary can be thought ofas a generalization of certain adversarial models, like thespy-based adversary and the snapshot adversary.

Recall that in the spy-based adversary, each node otherthan the source is corrupt with probability p, and corruptnodes observe the exact timestamp at which they receive themessage. This setting can be represented by an eavesdropperadversary with random observation delays: either the nodereports with an instantaneous delay, or it reports with aninfinite delay.

On the other hand, in the snapshot adversary, the adver-sary learns only which nodes are infected at observation timet, not their timestamps. This is similar to an eavesdropperadversary in which the random reporting delays have highvariance. As such, the timestamp has little value beyondindicating that the reporting node is infected.

The similarities between the eavesdropper adversary andthese canonical models may be useful to the extent that wecan study one to gain intuition about the other. For exam-ple, we can use techniques developed for the eavesdropperadversary to study the spy-based adversary. The ML prob-ability of detection for the spy-based adversary on regulartrees has evaded exact analysis [13], though several heuris-tics have been found to work well [29, 40]. A simple lowerbound on the ML probability of detection comes from an-alyzing the first-timestamp estimator for the spy-based ad-versary [13]. This gives

lim inft→∞

P(MFT(τ , G) = v∗) ≥ p, (12)

since with probability p, the first node to receive the mes-sage is a spy. Although reporting centrality is a subopti-mal source estimator for the eavesdropper adversary, it isstraightforward to analyze and can be used to obtain lowerbounds on the probability of detection for the spy-based ad-versary.

10

Page 11: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

Using similar proof techniques to Theorem 4.2, we canlower-bound the spy-based adversary’s ML probability ofdetection as time t→∞.

Corollary 5.1. Consider a diffusion process of rate λ =1 over a d-regular tree. Suppose this process is observed byan spy-based adversary, which sees the exact timestamp ofeach node independently with probability p > 0 (otherwise itsees nothing). Then the reporting centrality estimator has aprobability of detection P(MRC(τ , G) = v∗) that satisfies

lim inft→∞

P(MRC(τ , G) = v∗) ≥ Cd > 0, (13)

where

Cd = 1− d(

1− I1/2(

1

d− 2, 1 +

1

d− 2

)).

(Proof in Section B.6)The key point to notice about Corollary 5.1 is that (13)

remains positive for a fixed d even as p → 0; this is incontrast with the previously-known equation (12). Keep inmind that we are first taking the limit as t→∞ for a fixedp, then taking p → 0. It is not possible to exchange theorder of these limits; doing so drives the probability of de-tection to zero. The reason is that for a finite t (and hencea finite number of infected nodes), as p→ 0, the number ofspies in the infected subgraph also tends to zero. Withoutany observations, it is impossible to detect the source. How-ever, for any fixed positive p > 0, as t→∞, eventually thenumber of spies in each subtree concentrates, so reportingcentrality has a nonzero probability of detection. Indeed,our analysis makes critical use of the fact that as t→∞, fora fixed positive p, the classical Polya urn we use to representthe underlying diffusion process concentrates.

Although the lower bound in (13) does not depend on p,the convergence rate does; again, understanding this conver-gence rate is of theoretical interest. More broadly, a deeperunderstanding of the similarities between the eavesdropperadversary and other canonical adversarial models is needed.

6. RELATED WORKThere are two primary categories of related work, which

originate from the security and network-analysis communi-ties, respectively. The first concerns the anonymity proper-ties of bitcoin (and related cryptocurrencies). The secondstudies rumor source detection under canonical adversarialand graph models.

Bitcoin and Anonymity. Bitcoin has long been known toexhibit poor anonymity properties. Many papers have ex-plored the anonymity implications of having a public blockchain.In particular, pseudonyms can often be linked together (im-plying common ownership) with simple, heuristic clusteringmethods. For instance, Bitcoin transactions are allowed tohave multiple inputs; in practice, the multiple input pseudonymsare often owned by the same entity. Using this and othersimple heuristics, researchers have been able to cluster pseudo-nyms in the wild [4, 23, 31, 32, 28]. Since some nodes are al-ready deanonymized (e.g., public vendors), transaction pat-terns can be used to learn the identities of other users. Morerecently, researchers have demonstrated attacks on the ac-tual P2P network [8, 19], as discussed in Section 2.

In response to blockchain-enabled deanonymization threats,several papers have proposed anonymous alternatives to Bit-coin; these alternatives typically rely on privacy-preserving

cryptographic protocols. For example, a new cryptocur-rency called Zcash [3] uses the Zerocash protocol [33], whichcryptographically masks transactions in the blockchain andproves validity using non-interactive zero-knowledge proofs.Although Zcash obfuscates more information than Bitcoin,the masked transactions are still broadcast over the P2Pnetwork using the same protocols as Bitcoin. Hence, aneavesdropper adversary could still learn the IP address thatoriginates each transaction.

Another recently-proposed solution is TumbleBit, an un-trusted payment hub that can be used for the anonymoustransferral of bitcoins [15]. Unlike Zcash, part of the Tum-bleBit protocol occurs off-blockchain, which prevents tracingattacks by an eavesdropper adversary. However, the senderand receiver still conduct blockchain operations to escrowand retrieve bitcoins at the beginning and end of each trans-action, respectively; these operations use the usual BitcoinP2P networking protocol, and could be deanonymized (inprinciple) by a payment hub that also acts as an eavesdrop-per adversary. In summary, several anonymity-preservingaltcoins are vulnerable to the kinds of attacks outlined inthis paper, despite their strong cryptographic guarantees atthe application layer.

Rumor Source Detection. The last five years have seensignificant work on detecting the source of a diffusion pro-cess over a graph. This topic became popular following the2010 results of Shah and Zaman, who showed that undera snapshot adversary on a regular tree, one can reliablyinfer the source of a diffusion process [34]. That is, theprobability of detection is lower-bounded by a constant ast → ∞. The analysis for these results uses the same Polyaurn construction that we adopt in the proofs of Theorem 4.2and Corollary 5.1 [36, 35]. Shah and Zaman later extendedthese results to random, irregular trees [37], and other au-thors studied heuristic source detection methods on generalgraphs [14, 30, 20] and related theoretical limits [38, 25, 18].

Follow-up research has considered several variants on thesnapshot adversary. For example, Pinto et al. considered aspy-based adversary that observes a diffusion process wherethe delays are truncated Gaussian random variables [29], andZhu and Ying consider a spy-based adversary with standardexponential delays [40]. These papers do not characterizethe ML probability of detection, but they do propose effi-cient heuristics that perform well in practice. Our work in-stead analyzes the eavesdropper adversary, a new adversar-ial model that emerged from practical attacks on the Bitcoinnetwork [8, 19]. As we saw in various theoretical results, theeavesdropper adversary requires completely new analyticaltools compared to the spy-based and snapshot adversaries.

Similarly, there has been extensive work on alternativespreading models. In particular, researchers have studiedvarious forms of diffusion, in which nodes can “recover fromthe infection”—i.e., they can delete the transaction beforeit reaches the entire network. The classic diffusion processis often called susceptible-infected (SI): nodes start as sus-ceptible, and once they become infected, they remain so forthe rest of time. Alternative models include susceptible-infected-susceptible (SIS) diffusion, in which nodes recoverfrom the infection with a random delay and can then bere-infected, and susceptible-infected-recovered (SIR) diffu-sion, in which nodes recover with an exponential delay, uponwhich they are immune to future infection. These models

11

Page 12: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

have been studied in the literature, both theoretically andempirically [41, 11, 42, 7]. In our work, we consider onlyclassical SI diffusion processes, but we also consider the com-pletely new trickle spreading protocol. As we have seen inTheorems 3.1 and 3.3, the theoretical machinery for analyz-ing trickle spreading differs significantly from that used toanalyze diffusion. The differences arise from the combinato-rial nature of trickle spreading.

7. CONCLUSIONIn this paper, we analyze the anonymity properties of the

Bitcoin P2P network, with particular attention to the shiftfrom trickle to diffusion propagation in 2015. We find thattrickle and diffusion have similar (poor) anonymity proper-ties. On regular trees, they exhibit the same scaling prop-erties asymptotically in d, and numerically similar proba-bilities of detection for a fixed d. In simulation over a realBitcoin graph topology, diffusion and trickle also exhibit nu-merically similar performance, which agrees with our theo-retical predictions. This leads us to conclude that the cur-rent flooding protocols used in the Bitcoin network do notsufficiently protect user anonymity.

An interesting question is how to modify the networkingstack in order to provide robustness to source deanonymiza-tion attacks. Although the design of such solutions is beyondthe scope of this paper, our analysis gives some intuition forhow to prevent deanonymization attacks. A key reason thatdeanonymization is currently possible is because of the sym-metry of current spreading protocols. That is, diffusion andtrickle both propagate content over the underlying graphin all directions at roughly the same rate. This symmetryenables powerful centrality-based attacks. Thus, a naturalsolution is to break the symmetry of diffusion and trickle.Understanding how to break symmetry without hurting per-formance is of both theoretical and practical interest.

AcknowledgmentsThe authors would like to extend heartfelt thanks to An-drew Miller for pointing out the Bitcoin community’s transi-tion from trickle to diffusion, Varun Jog for valuable insightsabout the limiting behavior of Polya urns, and Sewoong Ohfor discussions on modeling the evolution of diffusion pro-cesses with an eavesdropper adversary.

8. REFERENCES[1] Bitcoin core commit 5400ef6.

https://github.com/bitcoin/bitcoin/commit/

5400ef6bcb9d243b2b21697775aa6491115420f3.

[2] Bitcoin core integration/staging tree.https://github.com/bitcoin/bitcoin.

[3] Zcash, 2016. https://z.cash/.

[4] Elli Androulaki, Ghassan O Karame, Marc Roeschlin,Tobias Scherer, and Srdjan Capkun. Evaluating userprivacy in bitcoin. In International Conference onFinancial Cryptography and Data Security, pages34–51. Springer, 2013.

[5] Krishna B Athreya and Peter E Ney. Branchingprocesses, volume 196. Springer Science & BusinessMedia, 2012.

[6] Carl M Bender and Steven A Orszag. Advancedmathematical methods for scientists and engineers I.Springer Science & Business Media, 1999.

[7] Edoardo Beretta and Yasuhiro Takeuchi. Globalstability of an sir epidemic model with time delays.Journal of mathematical biology, 33(3):250–260, 1995.

[8] Alex Biryukov, Dmitry Khovratovich, and IvanPustogarov. Deanonymisation of clients in bitcoin p2pnetwork. In Proceedings of the 2014 ACM SIGSACConference on Computer and CommunicationsSecurity, pages 15–29. ACM, 2014.

[9] Alex Biryukov and Ivan Pustogarov. Bitcoin over torisn’t a good idea. In 2015 IEEE Symposium onSecurity and Privacy, pages 122–134. IEEE, 2015.

[10] Bitnodes. Global bitcoin nodes distribution, 2016.

[11] Zhen Chen, Kai Zhu, and Lei Ying. Detecting multipleinformation sources in networks under the sir model.IEEE Transactions on Network Science andEngineering, 3(1):17–31, 2016.

[12] Florian Eggenberger and George Polya. Uber diestatistik verketteter vorgange. ZAMM-Journal ofApplied Mathematics and Mechanics/Zeitschrift furAngewandte Mathematik und Mechanik, 3(4):279–289,1923.

[13] G. Fanti, P. Kairouz, S. Oh, K. Ramchandran, andP. Viswanath. Metadata-aware anonymous messaging.In ICML, 2015.

[14] V. Fioriti and M. Chinnici. Predicting the sources ofan outbreak with a spectral technique.arXiv:1211.2333, 2012.

[15] Ethan Heilman, Foteini Baldimtsi, Leen Alshenibr,Alessandra Scafuro, and Sharon Goldberg. Tumblebit:An untrusted tumbler for bitcoin-compatibleanonymous payments. 2017.

[16] Svante Janson. Functional limit theorems formultitype branching processes and generalized polyaurns. Stochastic Processes and their Applications,110(2):177–245, 2004.

[17] Svante Janson. Functional limit theorems formultitype branching processes and generalized polyaurns. Stochastic Processes and their Applications,110(2):177–245, 2004.

[18] Justin Khim and Po-Ling Loh. Confidence sets for thesource of a diffusion in regular trees. arXiv preprintarXiv:1510.05461, 2015.

[19] Philip Koshy, Diana Koshy, and Patrick McDaniel. Ananalysis of anonymity in bitcoin using p2p networktraffic. In International Conference on FinancialCryptography and Data Security, pages 469–485.Springer, 2014.

[20] A. Y. Lokhov, M. Mezard, H. Ohta, andL. Zdeborova. Inferring the origin of an epidemic withdynamic message-passing algorithm. arXiv preprintarXiv:1303.5315, 2013.

[21] Paul Mah. Top 5 vpn services for personal privacy andsecurity, 2016.http://www.cio.com/article/3152904/security/

top-5-vpn-services-for-personal-privacy-and-security.

html.

[22] Hosam Mahmoud. Polya urn models. CRC press, 2008.

[23] Sarah Meiklejohn, Marjori Pomarole, Grant Jordan,Kirill Levchenko, Damon McCoy, Geoffrey M Voelker,and Stefan Savage. A fistful of bitcoins: characterizingpayments among men with no names. In Proceedings

12

Page 13: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

of the 2013 conference on Internet measurementconference, pages 127–140. ACM, 2013.

[24] Andrew Miller, James Litton, Andrew Pachulski, NealGupta, Dave Levin, Neil Spring, and BobbyBhattacharjee. Discovering bitcoin?s public topologyand influential nodes, 2015.

[25] Chris Milling, Constantine Caramanis, Shie Mannor,and Sanjay Shakkottai. Network forensics: randominfection vs spreading epidemic. ACM SIGMETRICSPerformance Evaluation Review, 40(1):223–234, 2012.

[26] David Z. Morris. Legal sparring continues in bitcoinuser?s battle with irs tax sweep, 2017.http://fortune.com/2017/01/01/

bitcoin-irs-tax-sweep-user-battle/.

[27] Satoshi Nakamoto. Bitcoin: A peer-to-peer electroniccash system, 2008.

[28] Micha Ober, Stefan Katzenbeisser, and KayHamacher. Structure and anonymity of the bitcointransaction graph. Future internet, 5(2):237–250, 2013.

[29] P. C. Pinto, P. Thiran, and M. Vetterli. Locating thesource of diffusion in large-scale networks. Physicalreview letters, 109(6):068702, 2012.

[30] B. A. Prakash, J. Vreeken, and C. Faloutsos. Spottingculprits in epidemics: How many and which ones? InICDM, volume 12, pages 11–20, 2012.

[31] Fergal Reid and Martin Harrigan. An analysis ofanonymity in the bitcoin system. In Security andprivacy in social networks, pages 197–223. Springer,2013.

[32] Dorit Ron and Adi Shamir. Quantitative analysis ofthe full bitcoin transaction graph. In InternationalConference on Financial Cryptography and DataSecurity, pages 6–24. Springer, 2013.

[33] Eli Ben Sasson, Alessandro Chiesa, Christina Garman,Matthew Green, Ian Miers, Eran Tromer, and MadarsVirza. Zerocash: Decentralized anonymous paymentsfrom bitcoin. In 2014 IEEE Symposium on Securityand Privacy, pages 459–474. IEEE, 2014.

[34] D. Shah and T. Zaman. Detecting sources of computerviruses in networks: theory and experiment. In ACMSIGMETRICS Performance Evaluation Review,volume 38, pages 203–214. ACM, 2010.

[35] D. Shah and T. Zaman. Finding rumor sources onrandom graphs. arXiv preprint arXiv:1110.6230, 2011.

[36] D. Shah and T. Zaman. Rumors in a network: Who’sthe culprit? Information Theory, IEEE Transactionson, 57:5163–5181, Aug 2011.

[37] D. Shah and T. Zaman. Rumor centrality: a universalsource detector. In ACM SIGMETRICS PerformanceEvaluation Review, volume 40, pages 199–210. ACM,2012.

[38] Z. Wang, W. Dong, W. Zhang, and C.W. Tan. Rumorsource detection with multiple observations:Fundamental limits and algorithms. In ACMSIGMETRICS, 2014.

[39] Eric W Weisstein. Euler-mascheroni constant. 2002.

[40] K. Zhu and L. Ying. A robust information sourceestimator with sparse observations. arXiv preprintarXiv:1309.4846, 2013.

[41] Kai Zhu and Lei Ying. A robust information sourceestimator with sparse observations. ComputationalSocial Networks, 1(1):1, 2014.

[42] Kai Zhu and Lei Ying. Information source detection inthe sir model: a sample-path-based approach.IEEE/ACM Transactions on Networking,24(1):408–421, 2016.

13

Page 14: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

APPENDIXA. PROOFS

B. ALGORITHMSTimestamp rumor centrality is described in Protocol 9.

Here we have assumed that the adversary has only one linkper honest server, so θ = 1. This is easily extended togeneral θ.

B.1 Proof of Theorem 3.1We can write this lower bound explicitly. Recall that the

system is discrete-time, and each node has d honest neigh-bors and θ connections to the adversary. τi denotes the firsttime node i reports to the adversary.

P(τ0 < τm) =

d+θ∑i=1

P(τ0 = i)P(τm > i|τ0 = i) (14)

Since the source has only d honest connections, for any i >(d + 1), it holds that P(τ0 = i) = 0, and if i = d + 1, thenτm ≤ i. So we can simplify this summation to

P(τ0 < τm) =

d∑i=1

P(τ0 = i)P(τm > i|τ0 = i). (15)

This means we only need to consider the first d time stepsof the message spread in order to lower-bound the first-timestamp adversary’s probability of detection. Let aj ,P(τm = j|τ0 ≥ j) = P(τm = j|τ0 = i for any i ≥ j). Then

P(τm > i|τ0 = i) = 1−∑ij=1 aj , so

P(τ0 < τm) =

d∑i=1

P(τ0 = i)−d∑i=1

P(τ0 = i)

i∑j=1

aj

= 1− P(τ0 = d+ 1)−d∑i=1

P(τ0 = i)

i∑j=1

aj .

(16)

Letting bk , P(τm > k | τ0 ≥ k, τm > (k − 1)) gives

aj = (1− bj)j−1∏k=1

bk. (17)

We can write bk explicitly by noting that as long as no in-fected nodes have reported the message to the adversary, onecan deterministically compute the number of infected nodesat each time step with a given number of infected (resp. un-infected) neighbors. This is because the underlying graph isa regular tree (see Lemma 3.2 for proof). Let N (k, t) denotethe set of nodes with k infected, honest neighbors at timet. Then for a fixed time t, we can compute the probabilitythat every infected node chooses to infect an honest node inthe next time step, by indexing over the value of k:

bj =

j−1∏k=1

(|{v ∈ V : uv,h(j) = k}||{v ∈ V : uv(j) = k}|

)|N (k,j)|

=

j−1∏k=1

(d− k

d− k + θ

)2j−1−k

where j > 1, and b1 = 1. Here, the quantity uv(j) (resp.uv,h(j)) denotes the uninfected degree (resp. uninfected

honest degree) of node v at time j—that is, the numberof total (resp. honest), uninfected neighbors of a node. Sothe ratio in the definition of bj is comparing the numberof nodes with honest uninfected degree k to the number ofnodes with uninfected degree k.

Substituting this quantity into Equation (17), we get

aj = (1− bj)j−2∏k=1

k∏m=1

(d−m

d−m+ θ

)2k−m

= (1− bj)j−2∏m=1

(d−m

d−m+ θ

)∑j−m−2`=0

2`

= (1− bj)j−2∏m=1

(d−m

d−m+ θ

)2j−m−1−1

︸ ︷︷ ︸Mj

(18)

Rearranging, we get

Mj =

∏j−1m=1

(d−md−m+θ

)2j−1−m∏j−2k=1

(d−k+θd−k

)d−j+1d+θ−j+1

= bj

(d+ θ − j + 1

d− j + 1

) j−2∏k=1

(d− k + θ

d− k

)︸ ︷︷ ︸

Wj

Writing out the terms of Wj explicitly, we get that

Wj =(d− 1 + θ)(d− 2 + θ) . . . (d− 1) . . . (d− j + 2 + θ)

(d− 1) . . . (d− j + 2 + θ) . . . (d− j + 2)

=

θ−1∏m=0

d+m

d+m− j + 2.

Substituting all of this into Equation (18), we get that

aj = (1− bj)bj(d− j + 1 + θ

d− j + 1

) θ−1∏m=0

d+m

d+m− j + 2.

This expression for aj can be used to rewrite (16):

P(τ0 < τm) = 1− P(τ0 = d+ 1)︸ ︷︷ ︸PA

−d∑i=1

P(τ0 = i)

i∑j=1

aj .

= 1− PA −d∑i=1

P(τ0 = i)

i∑j=1

(1− bj)bj ×

d− j + 1 + θ

d− j + 1

θ−1∏m=0

d+m

d+m− j + 2. (19)

Setting

qj ,d− j + 1 + θ

d− j + 1

θ−1∏m=0

d+m

d+m− j + 2,

we get that

P(τ0 < τm) = 1− PA −d∑i=1

P(τ0 = i)

i∑j=1

(1− bj)bjqj

= 1− PA −d∑i=1

P(τ0 = i)

(i∑

j=2

bjqj −i∑

j=2

b2jqj

)(20)

14

Page 15: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

Figure 9: Timestamp Rumor Centrality. Returns the timestamp rumor centrality of candidate source v, given timestampsτ on tree G. φ(w) denotes the children of node w on a tree T rooted at v. ∂T denotes the leaves of tree T .

Input: Timestamps τ , graph G, candidate source v, esti-mation time t ≥ d+ 1

Output: Number feasible orderings from source v, fv1: [d+ 1] , {1, . . . , d+ 1}2: mvv ← {0}3: . Two global variables, T and C:4: T ← balanced tree of radius t, center v, over graph G5: C ← dictionary of number of feasible partial orderings

for each feasible (w, Xw) pair6: c =PassToLeaves(v, v, {0}, T )7: return c . Timestamp rumor centrality for v8:9: function PassToLeaves(z, w,m, T ) . Each node

learns its possible receipt times, recursively10: m(w)← m \ {t ∈ m : t ≥ τw} . Remove late times11: Node w saves m(w) outside function scope12: if w ∈ ∂T then13: for t′ ∈ m do14: C[w][t′]← 1 . Number of partial orderings

with w infected at time t′

15: else16: m′ ← ∪i∈m(w){i+ j : j ∈ [d+ 1]}17: m′ ← m′ \ {τw}18: for y ∈ φ(w) do19: PassToLeaves(w, y,m′)

20: AggregateMessages(w, φ(w))21: if z = w = v∗ then22: return C[v∗][0]

23:24: function AggregateMessages(z, φ(z), T ) . Counts

the number of valid orderings by passing messages thatrepresent the set of feasible orderings

25: N ← [z, φ(z)] . Ordered list of parent and children26: M ←

∏u∈φ(z)m(u)

27: .∏

denotes Cartesian set product, where

A×B , {(a, b) : a ∈ A, b ∈ B}28: M ← m(w)×M . Prepends the current node’s

feasible receipt times to the Cartesian product29: M ← {m ∈M : [(m1, . . . ,md) distinct]∧[|mi−mj |≤

d+ 1 for all i, j] ∧[m1 < mi for all i > 1]}30: . Removes ordered tuples where neighbors

receive the message at the same time, are too far apartto be feasible, or parent gets message after children. mi

denotes ith element of ordered tuple m31: for m ∈M do32: q ←

∏di=2 C[Ni][mi] . Compute

the number of permutations by multiplying counts fromeach child node

33: C[z][m1]← C[z][m1] + q

34: return

where the change of summation bounds in (20) occurs be-

cause b1 = 1. We define γk ,(

d−kd−k+θ

)2−k, which means

that

bj =

j−1∏k=1

γ2j−1

k . (21)

Using this notation, we write out the two final summationsin (20) explicitly:

i∑j=2

bjqj = q2γ21 + q3γ

41γ

42 + q4γ

81γ

82γ

83 + . . .+ qiγ

2i−1

1 · · · γ2i−1

i−1

(22)

i∑j=2

b2jqj = q2γ41 + q3γ

81γ

82 + q4γ

161 γ16

2 γ163 + . . .+ qiγ

2i

1 · · · γ2i

i−1

(23)

Subtracting (22)-(23) and collecting terms gives

i∑j=2

qj(bj − b2j ) = q2γ21 + γ4

1(q3γ42 − q2) + γ8

1γ82(q4γ

83 − q3)+

. . .+ γ2i−1

1 · · · γ2i−1

i−2 (qiγ2i−1

i−1 − qi−1)− qiγ2i

1 · · · γ2i

i−1

= q2γ21 − qiγ2i

1 · · · γ2i

i−1 +

i∑`=3

γ2`−1

1 · · · γ2`−1

`−2 (q`γ2`−1

`−1 − q`−1).

(24)

First, we show that the summation (last term) in (24) is

equal to 0 by writing out (q`γ2`−1

`−1 − q`−1):

q`γ2`−1

`−1 − q`−1 = q`−1

(d− `+ 1 + θ

d− `+ 1γ2`−1

`−1 − 1

)= q`−1

(d− `+ 1 + θ

d− `+ 1· d− `+ 1

d− `+ 1 + θ− 1

)= 0.

Next, we show that the first term in (24) equals 1 by writingq2γ

21 explicitly:

q2γ21 =

d− 2 + 1 + θ

d− 2 + 1

(θ−1∏m=0

d+m

d+m− 2 + 2

)d− 1

d− 1 + θ= 1,

which implies that

i∑j=1

qj(bj − b2j ) = 1− qiγ2i

1 · · · γ2i

i−1 = 1− b2i . (25)

Now, we can substitute (25) into (20), getting

P(τ0 < τm) = 1− PA −d∑i=2

P(τ0 = i)(1− qiγ2i

1 · · · γ2i

i−1)

θ + d+

d∑i=2

P(τ0 = i)qiγ2i

1 · · · γ2i

i−1, (26)

15

Page 16: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

where (26) is because

d∑i=2

P(τ0 = i) = 1− P(τ0 = d+ 1)− P(τ0 = 1)

= 1− PA −θ

θ + d.

Note that P (τ0 = i) =(N−iθ−1)(Nθ )

, where N , d + θ. We can

also rewrite qi as

qi =d− i+ 1 + θ

d− i+ 1

θ−1∏m=0

d+m

d+m− i+ 2

=d− i+ 1 + θ

d− i+ 1· (d+ θ − 1)!

(d− 1)!· (d− i+ 1)!

(d+ θ + 1− i)! ·θ!

θ!

=

(N−1θ

)(N−iθ

) .Thus, the product of these terms is

P (τ0 = i)qi =θ

N − i− θ + 1· N − θ

N

d− i+ 1· d

θ + d,

and the desired probability in (26) simplifies to

P(τ0 < τm) =θ

θ + d+

d∑i=2

d

d− i+ 1· θ

θ + db2i

θ + d

[1 + d

d∑i=2

b2id− i+ 1

]

≥ θ

θ + d

[1 +

d∑i=2

b2i

](27)

≥ θ

θ + d

[1 +

d−1∑i=1

γ∑ik=1 2k

1

](28)

θ + d

[1 +

d−1∑i=1

γ2i+1−21

]

≥ θ

d

d−1∑i=0

γ2i+1

1 (29)

where (27) comes from replacing d− i+1 with d, (28) comesfrom replacing γk with γ1 since γk ≥ γ1, and (29) holdsbecause γ2

1 ≤ dd+θ

. We lower-bound the doubly exponential

sum in (29) by integrating. Letting ρ = γ21 , we have

θ

d

d−1∑i=0

(γ21)2

i

≥ θ

d

∫ d

0

ρ2x

dx

d log 2

(Ei(2d log ρ2)− Ei(log ρ2)

)where Ei(·) denotes the exponential integral. This gives thedesired result.

B.2 Proof of Proposition 3.2The proposition can be seen through a simple counting

argument. Let At denote the set of active nodes at timet, or the set of all infected, honest nodes with at least one

uninfected neighbor (honest or adversarial). We also de-fine the uninfected degree of a node uv(t) as the numberof uninfected neighbors of node v at time t. We defineAt(i) = |{v ∈ V : uv(t) = i}|, i > 0, as the number ofactive nodes at time t with uninfected degree i.

We claim that for a given regular tree G and set of ob-served timestamps τ , and for any set of feasible orderings,the number of nodes with uninfected degree i (i.e., At(i))for a given i > 0 does not depend on the underlying order-ing. This holds because the graph is regular; we can showit formally by induction. At time t = 1, there are two op-tions: either the source reports directly to the adversary, orit spreads the message to an honest neighbor. It if reportsto the adversary, then in every feasible ordering, the sourcemust report to the adversary at t = 1. Therefore at the endof time step t = 1, we have

At(i) =

{1 if i = d+ θ − 1

0 otherwise.

If the source instead passes the message to an honest neigh-bor (it doesn’t matter which one, and the identity of theneighbor could vary across feasible orderings), then

At(i) =

{2 if i = d+ θ − 1

0 otherwise,

because now we have two active nodes, each of which hasd+ θ − 1 uninfected neighbors.

Now take t > 1, and assume that at time t− 1, At(i) wasthe same across all feasible orderings, for each i > 0. Wewant to show that the same is true at time t. Every time anactive node v infects a neighbor node w, v’s own uninfecteddegree decreases by one. If w is an honest node, it joins theset of active nodes with uninfected degree uw(t) = d−1 +θ.If w belongs to the eavesdropper, then it does not join theactive nodes. Suppose τ indicates that in time t, exactly mnodes will report to the adversary. Since there were |At−1|active nodes at time t, we know that |At−1|−m new activenodes will be infected, each with degree d− 1 + θ, so At(d−1 + θ) = |At−1|−m. Moreover, for all 0 < i < (d − 1 + θ),we have At(i) = At−1(i + 1), since each previously-activenode decrements its uninfected degree by one. None of thisdepends on the ordering, which proves that At(i) takes thesame value for any feasible ordering.

We write out the likelihood L(o) of a feasible ordering o:

L(o) =

t−1∏j=1

∏v∈Aj

1

uv(j),

since each active node infects exactly one uninfected neigh-bor uniformly at random in each time step. By the previousargument, this likelihood can equivalently be grouped bynodes with the same uninfected degree u:

L(o) =

t∏j=1

∏u∈{1,...,d−1+θ}

(1

u

)Aj(u).

Nothing in this expression depends on the ordering o (sinceAj(u) is independent of o), so the likelihood must be equalfor all feasible orderings.

B.3 Proof of Theorem 3.3

16

Page 17: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

We begin by showing that (6) is an upper bound on anyestimator’s probability of detection, and then show that ballcentrality achieves the lower bound in (7).(1): Notice that with probability θ/(θ + d), the true sourcereports the message to the adversary in the first time slot. Ifthis happens, then the conditional probability of detection is1; since the adversary is assumed to know the starting timeof the trickle process, it can deduce that the true sourceis the only possible source. However, if the source does notreport to the adversary in the first time slot, then the sourcemust instead pass the message to one of its honest neighbors.In that case, at t = 1, exactly two nodes are infected, both ofwhich are honest. Thereafter, the spread from each of thesenodes is completely symmetric: both the true source andthe first neighbor are connected to identical graph structures(i.e., two infinite trees rooted at each of the infected nodes),and the message spreading dynamics from each of these twonodes is identically-distributed. As such, no estimator candistinguish between the true source and the first neighbor,meaning the probability of detection is upper bounded by1/2 in this case. Therefore, a simple upper bound for theML probability of detection is

P(MML(τ , G) = v∗) ≤ θ

θ + d(1) +

d

θ + d

(1

2

)= 1− d

2(θ + d).

(2): Next, we demonstrate that ball centrality has a proba-bility of detection that is lower bounded by (7). Once again,if the source immediately passes the message to the adver-sary, then the adversary identifies a ball of size 1, so thesource gets caught with probability 1. This happens withprobability θ

θ+d. Thus we need to compute the probability

of detection conditioned on the source passing the messageto an honest neighbor at t = 1. Suppose source v∗ passesthe message to its neighbor w at t = 1. Without loss ofgenerality, let us think of w and v∗ as the left and right re-spective endpoints of a path. In each subsequent time step,each of the endpoints of this path will forward the messageeither to another honest node or to the adversary. If the left(resp. right) endpoint q forwards the message to anotherhonest node z, then z becomes the left (resp. right) end-point in the following time step; if q forwards the messageto the adversary, then the path terminates at q. This pathcontinues to grow until both ends have terminated.

We first show that the probability of the path terminatingis lower bounded by 1− 2

(1+θ/d)t−1 , then we show that con-

ditioned on path termination, the probability of detection islower bounded by 1/2. To analyze the probability of termi-nation at time t, note that each end of the path terminatesafter a geometrically-distributed number of hops. This holdsbecause each endpoint independently terminates with prob-ability θ

θ+d−1in each time step. Therefore, the probability

of both endpoints terminating by time t can be expressed as

P(B ≤ t)2 =

(1−

(1− θ

θ + d− 1

)t−1)2

,

(1−

(d

θ + d

)t−1)2

≥ 1− 2

(d

θ + d

)t−1

(30)

where B is a geometric random variable of rate θθ+d−1

.Now, we show that conditioned on termination, the prob-

ability of detection is lower-bounded by 12. We call x and y

left and right terminating endpoints, respectively (Fig. 10).

𝑥 𝑦𝑤 𝑣 ∗ ……Figure 10: Arrangement of nodes from the proof of Thm.3.3.

Now, we show that the balls centered at nodes x and yhave an intersection of at most two nodes. The ball centeredat node x has a radius of τx − 1, and the ball centered at ya radius of τy − 1. By construction, we know that the hopdistance between v∗ and x is h(v∗, x) = τx − 1 ≤ τx − 1 andh(v∗, y) = τy − 2 ≤ τy − 1, so v∗ must lie in the intersectionof the two balls. Similarly, h(w, x) = τx − 2 ≤ τx − 1 andh(w, y) = τy − 1 ≤ τy − 1, so w must lie in the intersectionof the two balls. Let us assume by contradiction that thereexists a third node z in the intersection of these two balls.This implies that h(z, x) ≤ τx − 1 and h(z, y) ≤ τy − 1.Either z lies on the shortest path between x and y, which wedenote P (x, y), or it does not. If z ∈ P (x, y), then it eitherlies to the left of w or to the right of v∗. In either case, z isexcluded from the ball centered at y or x, respectively. Thusit cannot lie on P (x, y). If z does not lie on P (x, y), thenthere exists an alternative path P ′(x, y) 6= P (x, y) betweenx and y of distance at most τx + τy − 2 that contains nodez; this path is allowed to traverse the same node or edgemultiple times. By construction, the hop distance betweenx and y is h(x, y) = τx + τy − 3, so P ′(x, y) must have atleast this many hops. Moreover, since G is a tree, every pathbetween x and y must traverse each node in P (x, y). SinceP (x, y) already contains τx + τy − 3 edges, P ′(x, y) shouldbe no longer than τx + τy − 2, and it should also touch anadditional node z, P ′(x, y) must have exactly one more hopthan P (x, y). This eliminates all paths that move from noder ∈ P (x, y) to node z /∈ P (x, y), then back to r again. Butthis implies that there exist two distinct paths between xand y in which no edge or vertex is traversed twice, which isa contradiction since G is a tree. Hence there can be at mosttwo nodes in the intersection of balls, so the probability ofdetection is at least 1/2 in this case.

Combining this with the lower bound in (30), we have

P(MBC(τ , G) = v∗)

≥ θ

d+ θ(1) +

1

d+ θ· 1

(1− 2

(d

d+ θ

)t−1)

= 1− d

2(θ + d)−(

d

d+ θ

)t.

Since the ML estimator performs at least as well as the ballcentrality estimator, the claim follows.

B.4 Proof of Theorem 4.1To analyze the probability of detection under the first-

timestamp estimator, we consider probability of the truesource reporting before any other node, or P(τ0 < τm). Weuse Ri to denote the random time delay between the infec-tion times of i’s parent and i; by “parent”, we mean withrespect to the infected subtree Gt, which is rooted at node

17

Page 18: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

P(τ0 < τm) =

∫ ∞t0=0

P(τ0 = t0)

[P(R1 > t0) +

∫ t0

r1=0

P(R1 = r1)P(τ1 > (t0 − r1))×

[P(R2 > (t0 − r1)) +

∫ t0−r1

r2=0

P(R2 = r2)P(τ2 > (t0 − r1 − r2)) [P(R3 > (t0 − r1 − r2)) + . . .]d−1

]d−1]d0

dt0

=

∫ ∞0

λ2e−λ2t0

[e−λ1t0 +

∫ t0

0

λ1e−λ1r1e−λ2(t0−r1)

[e−λ1(t0−r1) +

∫ t0−r1

0

λ1e−λ1r2e−λ2(t0−r1−r2)×

[e−λ1(t0−r1−r2) + . . .

]d−1

dr2

]d−1

dr1

]d0dt0 (31)

=

∫ ∞0

λ2e−λ2t0

[e−t0 +

∫ t0

0

e−r1e−λ2(t0−r1)[e−(t0−r1) +

∫ t0−r1

0

e−r2e−λ2(t0−r1−r2)×

[e−(t0−r1−r2) + . . .

]d−1

dr2

]d−1

dr1

]d0dt0 (32)

=

∫ ∞0

λ2e−t0(λ2+d0)

[1 + e−t0(λ2+d−2)

∫ t0

0

er1(λ2+d−2)

[1 + e−(t0−r1)(λ2+d−2)

∫ t0−r1

0

er2(λ2+d−2)×

[1 + . . .]d−1 dr2]d−1

dr1

]d0dt0 (33)

Figure 11: Probability of detection of the first-timestamp estimator, where d is the node degree. d0 denotes the degree of thesource, which we eventually set to d0 = d− 1 for symmetry. We condition on the source’s reporting time τ0 = t0. Ri denotesthe delay of infection times between i’s parent and i.

v∗ = 0. Moreover, we let the source have a different de-gree d0 than the rest of tree for simplicity of calculation.We write the probability of detection explicitly in Figure11. The expression starts by conditioning on the reportingtime of the true source, τ0, then conditions on the times atwhich other nodes receive the message. Recall that node ireceives the message at time Xi, and reports the message tothe adversary at time τi.

Expression (33) can be written recursively. Define

g(a) , e−a(λ2+d−2)

∫ a

0

eu(λ2+d−2)[1 + g(a− u)]d−1du.

(34)

Letting s = a− u, we get

g(a) =

∫ a

0

e−s(λ2+d−2)[1 + g(s)]d−1ds.

Substituting g(a) into Equation (33) gives

P(τ0 < τm) =

∫ ∞0

λ2e−t0(λ2+d0)[1 + g(t0)]d0dt0, (35)

where d0 is the degree of the source. To make this expressionsymmetric with respect to the recursive function g(·), weassume that the source has degree d−2, and all other nodeshave degree d, so d0 = d− 2. We subsequently compute theprobability of detection by solving for g(t0), which can bewritten as a differential equation:

g′(a) = e−a(λ2+d−2)[1 + g(a)]d−1,

with initial condition g(0) = 0. This separable, nonlineardifferential equation can be solved exactly, giving

g(a) =

((d− 2)e−a(λ2+d−2) + λ2

λ2 + d− 2

)− 1d−2

− 1.

Substituting into (35), we obtain the exact probability ofdetection:

P(τ0 < τm) =λ2(log(λ2 + d− 2)− log(λ2))

d− 2.

Letting λ2 = θ, we get

P(τ0 < τm) =θ

d− 2log

(d+ θ − 2

θ

), (36)

which is the final expression.

B.5 Proof of Theorem 4.2To prove this claim, we analyze the (suboptimal) reporting

centrality estimator, which achieves condition (11). Our goalis to show that under an eavesdropper adversary, reportingcentrality has a strictly positive probability of detection asd→∞.

Let Rt = {v ∈ Vt|Rv(t) = 1} denote the set of reportingcenters at time t. We compute the probability of detectionby conditioning on the event v∗ ∈ Rt. For brevity of no-tation, we use v to denote the reporting centrality sourceestimate in this proof. For any fixed time t, we have

P(MRC(τ , G) =v∗) = P(v∗ ∈ Rt)︸ ︷︷ ︸(a)

×

P(MRC(τ , G) = v∗|v∗ ∈ Rt)︸ ︷︷ ︸(b)

. (37)

Note that the probability of detection is defined for a fixedt. Our goal is to lower-bound this quantity as t → ∞. Theproof consists of three steps:

(a) Show that lim inft→∞ P(v∗ ∈ Rt) ≥ Cd > 0.

(b) Show that P(MRC(τ , G) = v∗|v∗ ∈ Rt) = 1.

(c) Combine parts (a) and (b) to give the claim.

18

Page 19: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

For readability, we abbreviate our notation in this proof.The number of nodes in the subtree T v

∗w will be denoted

Nw(t) instead of NTv∗w (t), and the number of reporting nodes

in this subtree will be denoted Yw(t) instead of YTv∗w (t).

Thus N1(t) denotes the number of infected nodes in the firstsubtree of v∗, and Y1(t) denotes the number of reportingnodes in the first subtree of v∗.Part (a): Show that lim inft→∞ P(v∗ ∈ Rt) ≥ Cd.

We demonstrate this by conditioning on the event that v∗

is the unique rumor center in Gt. This happens if andonly if ∀w ∈ N (v∗), Nw(t) < N(t)/2 [37], where N (v∗)denotes the neighbors of v∗. We let Ct = {v ∈ Vt | v =rumor center of Gt}, which gives

P(v∗ ∈ Rt) ≥ P(v∗ ∈ Ct, |Ct|= 1)︸ ︷︷ ︸(a1)

P(v∗ ∈ Rt|v∗ ∈ Ct, |Ct|= 1)︸ ︷︷ ︸(a2)

Part (a1) is studied in Theorem 3.1 of [37], which shows that

lim inft→∞

P(v∗ ∈ Ct, |Ct|= 1)

= 1− d(

1− I1/2(

1d−2

, 1 + 1d−2

)), (38)

where I1/2(a, b) is the regularized incomplete Beta function,or the probability that a Beta random variable with param-eters a and b takes a value in [0, 1/2).

For part (a2), we show that limt→∞ P(v∗ ∈ Rt|v∗ ∈ Ct, |Ct|=1) = 1. Our approach is to first show that the fraction ofreporting nodes in each tree converges almost surely to aconstant as t → ∞. We subsequently show that if v∗ is aunique rumor center, it is almost surely a reporting centeras t→∞.

Lemma B.1. For all i ∈ [d], the following condition holdsas t→∞:

Yi(t)

Ni(t)

a.s.−−→ αd,θ =θ

d+ θ − 2. (39)

(Proof in Section B.5.1)This lemma states that the ratio of reporting to infected

nodes in each subtree converges almost surely to a constant.The proof proceeds by describing the evolution of each sub-tree and the adversary’s observations as a generalized Polyaurn process with negative coefficients. The ratio of balls inthis urn process can be shown to converge almost surely,which implies the claim. Now we use Lemma B.1 to showthat v∗ is a reporting center.

Since v∗ is a unique rumor center, ∀i ∈ [d] (where [d] ={1, 2, . . . , d}), Ni(t) < N(t)/2 [36]. Because of this condi-tioning and the fact that the infected subtree sizes (normal-ized by the total number of infected nodes) converge almostsurely [37], for any outcome ω of the underlying diffusion

process, it holds that Ni(t, ω)/N(t, ω)t→∞−−−→ (1/2 − δi(ω)),

for some δi(ω) > 0. Here Ni(t, ω) denotes the number ofinfected nodes in subtree i at time t under outcome ω. Wewish to show that for any given, feasible set of offsets δi(ω),i ∈ [d], v∗ eventually becomes a reporting center w.p. 1.

Since Yi(t)/Ni(t)a.s.−−→ αd,θ, for a given outcome ω of the

underlying process, we can write Yi(t, ω) = Ni(t, ω)(αd,θ +

εi(t, ω)), where εi(t, ω) ∈ R. Note that εi(t, ω)t→∞−−−→ 0. We

write the condition for being a reporting center as

Yi(t, ω)?<Y (t, ω)

2

=⇒ Ni(t, ω)(αd,θ + εi(t, ω))

?<αd,θN(t, ω)

2+

∑j∈[d] εj(t, ω)Nj(t, ω)

2

=⇒ N(t, ω)(1

2− δi(ω))(αd,θ + εi(t, ω))

?<

1

2

αd,θN(t, ω) +∑j∈[d]

εj(t, ω)N(t, ω)(1

2− δj(ω))

=⇒ εi(t, ω)−

∑j εj(t, ω)( 1

2− δj(ω))

1− δi(ω)2

?< αd,θ · δi(ω).

(40)

Note that αd,θ and δi(ω) are both strictly positive, and

limt→∞

εi(t, ω)−∑j εj(t, ω)( 1

2− δj(ω))

1− δi(ω)2

= 0.

Therefore, for every outcome ω, there exists a time Tω suchthat for all t > Tω, condition (40) is satisfied, so limt→∞ P(v∗ ∈Rt|v∗ ∈ Ct, |Ct|= 1) = 1.

Putting together parts (a1) and (a2), we get

lim inft→∞

P(v∗ ∈ Rt)

≥ 1− d(

1− I1/2(

1

d− 2, 1 +

1

d− 2

)).

Part (b): Show that P(MRC(τ , G) = v∗|v∗ ∈ Rt) = 1.

We show this by demonstrating that if v∗ is a reportingcenter, no other reporting centers exist.

We show that there can be at most one reporting centerby contradiction. Suppose there are two nodes, v∗ and w,both of which are reporting centers. Consider the d subtreesadjacent to v∗. Suppose the neighbors of v∗ are labelledw1, . . . , wd. Each of these neighbors is the root of an infectedsubtree T v

∗wi . Since v∗ is a reporting center, we have two

properties:

Ywi(t) <Y (t)

2∀i ∈ [d] (41)

Yv∗(t) +∑i∈[d]

Ywi(t) = Y (t). (42)

Now suppose another node u 6= v∗ is also a reportingcenter. We label the neighbors of u as z1, . . . , zd. We willuse This implies that

YTuzi(t) <

Y (t)

2∀i ∈ [d] (43)

Yu(t) +∑i∈[d]

YTuzi(t) = Y (t). (44)

Suppose without loss of generality that z1, w1 ∈ P (u, v∗).In order to satisfy condition (43), it must hold that Yv∗(t)+∑di=2 Ywi(t) <

Y (t)2. Substituting this condition into (42),

and using condition (41) implies that Yw1(t) > Y (t)2. This is

a contradiction because v∗ is assumed to be an element ofRt, which implies that Yw1(t) < Y (t)/2. Therefore, therecan be no more than one reporting center, so P(MRC(τ , G) =v∗|v∗ ∈ Rt) = 1.

19

Page 20: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

Part (c): Combine parts (a) and (b) to give the final claim.

We wish to lower bound lim inft→∞ P(MRC(τ , G) = v∗). Re-call from (37) thatP(MRC(τ , G) = v∗) ≥ P(v∗ ∈ Rt)P(MRC(τ , G) = v∗|v∗ ∈ Rt).We have bounds for each term:

lim inft→∞

P(v∗ ∈ Rt)

≥ 1− d(

1− I1/2(

1

d− 2, 1 +

1

d− 2

))P(MRC(τ , G) = v∗|v∗ ∈ Rt) = 1,

which gives

lim inft→∞

P(v = v∗) ≥ 1− d(

1− I1/2(

1

d− 2, 1 +

1

d− 2

)),

thereby proving the claim.

B.5.1 Proof of Lemma B.1The evolution of the (partially unobserved) infected sub-

graph on regular trees can be described by a Polya urn pro-cess with d colors of balls. Each ball represents one activeedge between an infected and an uninfected node, and allactive edges in the same source-adjacent subtree have thesame color. At t = 0, we have one ball of each color, sincethere are d active edges extending from the true source.

Because of the memorylessness of spreading, each of theactive edges in Gt is equally likely to spread the infectionnext. The urn evolves as follows: pick a ball uniformlyat random; the subtree corresponding to the drawn colorspreads the message over one of its active edges, infecting aneighbor w. Once w is infected, one active edge is removedfrom the subtree (i.e., the edge that just spread the mes-sage), and d − 1 new active edges are added (from w to itsuninfected neighbors). In our urn, this corresponds to re-placing the drawn ball and adding d − 2 balls of the samecolor. The replacement matrix for this urn can therefore bewritten as A = (d−2)Id, where Id denotes the d×d identitymatrix.

This Polya urn is well-studied, and in the limit, the frac-tion of balls of each color is known to converge to a Dirich-let distribution [22]. In order to model the eavesdropper’sobservations, we generalize the urn model by additionallygiving each ball a pattern: striped or solid. Each solid ballcorresponds to an active edge in the underlying diffusionprocess, whereas each striped ball represents an active edgefrom an infected node to the adversary. When such an edgefires, the adversary observes the timestamp of the sendingnode.

We adapt the previous urn dynamics in two key ways.First, when a solid ball is drawn, we still add d−2 solid ballsof the same color, but now we also add θ striped balls of thesame color. These represent the θ independent connectionsbetween the node and the adversary. Second, when a stripedball is draw, we remove θ striped balls of the same color fromthe urn (i.e., the adversary only uses the first timestamp itreceives). Thus, the replacement matrix for a single subtreelooks like

A =

( solid stripe

solid d− 2 0stripe θ −θ

). (45)

Let sn and rn denote the number of solid and stripedballs, respectively, at the nth draw of the urn. The following

condition holds as n→∞:

rnsn

a.s.−−→ θ

d+ θ − 2. (46)

To show this, we use the following result from [17], simplifiedfor clarity:

Theorem 3.21 from [17]. Consider a Polya urn with re-placement matrix A. Assume the following conditions:

1. For i, j ∈ [d], Aii ≥ −1, and Aij ≥ 0 for i 6= j.

2. Aij <∞ for all i, j ∈ [d].

3. The largest real eigenvalue λA of A is positive, λA > 0.

4. The largest real eigenvalue λA of A is simple.

5. The urn starts with at least one ball of a dominatingtype. A dominating type is a type of ball that, whendrawn, produces balls of every other type.

6. λA belongs to the dominating type.

7. The urn does not go extinct.

Then n−1[sn rn]ᵀa.s.−−→ λAv, where ᵀ denotes the trans-

pose of a vector, λA is the largest eigenvalue of replacementmatrix A, and v is the corresponding right eigenvector.

Conditions 1 and 2 are satisfied by examination of A. Theeigenvalues of A are (d−2) and −θ, so conditions 3 and 4 aresatisfied. Conditions 5 and 6 are met because λA belongsto the class of solid balls (i.e., a dominating type), and theurn starts with a solid ball by construction. Condition 7 ismet because solid balls are never removed, and we start withone solid ball. Thus, Theorem 3.21 from [16] applies, which

implies that rnsn

a.s.−−→ v2v1

= θd+θ−2

, since the eigenvector for

λA = (d− 2) is v = [d+ θ − 2, θ].There is a one-to-one mapping between the evolution of

such an urn and the spreading of the message. Withoutloss of generality, we consider the evolution of the first sub-tree, N1(t). Let βn denote the time at which the nth ball isdrawn. We define s(t) = max{n:βn≤t,βn+1>t} sn and r(t) =max{n:βn≤t,βn+1>t} rn. We can now map the number of re-porting and infected nodes to the evolution of the Polya urn:

s(t) = 1 + (d− 2)N1(t)

r(t) = N1(t)− Y1(t).

Solving for Y1(t) and N1(t) and taking the limit gives

limt→∞Y1(t)N1(t)

= 1 − d−2θ

limt→∞r(t)s(t)−1

= θd+θ−2

with prob-

ability 1, since both s(t) and r(t) tend to infinity as t→∞.

B.6 Proof of Corollary 5.1The outline of this proof is similar to that of Theorem 4.2.

One difference is that in the spy-based adversary, the truesource v∗ never reports to the adversary, so Yv∗(t) = 0. Thisdoes not change the proof in any significant way. We againcondition on the event that the true source v∗ is a reportingcenter, which gives.

P(MRC(τ , G) =v∗) = P(v∗ ∈ Rt)︸ ︷︷ ︸(a)

×

P(MRC(τ , G) = v∗|v∗ ∈ Rt)︸ ︷︷ ︸(b)

. (47)

Recall that Rt is the set of reporting centers at time t. Forpart (b), if v∗ is a reporting center, then it is a unique report-ing center (from Proof B.5), so the probability of detection

20

Page 21: Anonymity Properties of the Bitcoin P2P Network · Anonymity Properties of the Bitcoin P2P Network Giulia Fanti Coordinated Sciences Laboratory University of Illinois, Urbana-Champaign

is 1. Thus, the key is to characterize (a), P(v∗ ∈ Rt), ast→∞.

As before, we lower bound this quantity by conditioningon the event that v∗ is the unique rumor center in Gt. Welet Ct = {v ∈ Vt | v = rumor center of Gt}, which gives

P(v∗ ∈ Rt) ≥ P(v∗ ∈ Ct, |Ct|= 1)︸ ︷︷ ︸(a1)

P(v∗ ∈ Rt|v∗ ∈ Ct, |Ct|= 1)︸ ︷︷ ︸(a2)

We know from Proof B.5 that part (a1) gives

lim inft→∞

P(v∗ ∈ Ct, |Ct|= 1)

= 1− d(

1− I1/2(

1d−2

, 1 + 1d−2

)), (48)

where I1/2(a, b) is the regularized incomplete Beta function,or the probability that a Beta random variable with param-eters a and b takes a value in [0, 1/2).

For part (a2), we show that limt→∞ P(v∗ ∈ Rt|v∗ ∈ Ct, |Ct|=1) = 1. This portion is the only real difference in proof be-tween the present corollary and Theorem 4.2. Again, weshow that the fraction of reporting nodes (or spies) in eachtree converges almost surely to a constant as t→∞. How-ever, unlike the eavesdropper adversary, the spy-based ad-versary does not require Polya urns to make this case.

Claim B.2. For all i ∈ [d], the following condition holdsas t→∞:

Yi(t)

Ni(t)

a.s.−−→ p. (49)

This claim follows easily from the central limit theorem,since the number of spy nodes (or reporting nodes) in theith subtree is simply a Binomial(Ni(t), p) random variable.Recall thatNi(t) denotes the number of infected nodes in theith subtree adjacent to v∗. Therefore, by the same argumentas Proof B.5, if v∗ is a unique rumor center, it is almostsurely also a reporting center as t→∞.

This, in turn, implies the overall result:

lim inft→∞

P(MRC(τ , G) = v∗) ≥ Cd > 0.

where

Cd = 1− d(

1− I1/2(

1

d− 2, 1 +

1

d− 2

)).

21


Recommended