Contents
1 An Analysis of Anonymity in the Bitcoin System . . . . . . . . . 1Fergal Reid, Martin Harrigan1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 A Note Regarding Motivation and Disclosure . . . . . . 31.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Electronic Currencies . . . . . . . . . . . . . . . . . . . . . . . . . 41.2.2 Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 The Bitcoin System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 The Transaction and User Networks . . . . . . . . . . . . . . . . . . . . 8
1.4.1 The Transaction Network . . . . . . . . . . . . . . . . . . . . . . 81.4.2 The User Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Anonymity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5.1 Integrating Off-Network Information . . . . . . . . . . . . . 151.5.2 TCP/IP Layer Information . . . . . . . . . . . . . . . . . . . . . . 171.5.3 Egocentric Analysis and Visualization . . . . . . . . . . . . 171.5.4 Context Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.5.5 Flow and Temporal Analyses . . . . . . . . . . . . . . . . . . . . 221.5.6 Other Forms of Analysis . . . . . . . . . . . . . . . . . . . . . . . . 251.5.7 Mitigation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1
arX
iv:1
107.
4524
v2 [
phys
ics.
soc-
ph]
7 M
ay 2
012
Chapter 1
An Analysis of Anonymity in theBitcoin System
Fergal Reid and Martin Harrigan
Abstract
Anonymity in Bitcoin, a peer-to-peer electronic currency system, is a com-plicated issue. Within the system, users are identified by public-keys only. Anattacker wishing to de-anonymize its users will attempt to construct the one-to-many mapping between users and public-keys and associate informationexternal to the system with the users. Bitcoin tries to prevent this attack bystoring the mapping of a user to his or her public-keys on that user’s nodeonly and by allowing each user to generate as many public-keys as required.In this chapter we consider the topological structure of two networks derivedfrom Bitcoin’s public transaction history. We show that the two networkshave a non-trivial topological structure, provide complementary views of theBitcoin system and have implications for anonymity. We combine these struc-tures with external information and techniques such as context discovery andflow analysis to investigate an alleged theft of Bitcoins, which, at the time ofthe theft, had a market value of approximately half a million U.S. dollars.
Key words: Network Analysis, Anonymity, Bitcoin
1.1 Introduction
Bitcoin is a peer-to-peer electronic currency system first described in a paperby Satoshi Nakamoto (a pseudonym) in 2008 [20]. It relies on digital sig-natures to prove ownership and a public history of transactions to prevent
Clique Research Cluster,
Complex & Adaptive Systems Laboratory,University College Dublin, Ireland
e-mail: [email protected],[email protected]
1
2 Reid and Harrigan
double-spending. The history of transactions is shared using a peer-to-peernetwork and is agreed upon using a proof-of-work system [13, 5].
The first Bitcoins were transacted in January 2009 and by June 2011 therewere 6.5 million Bitcoins in circulation among an estimated 10,000 users [28].In recent months, the currency has seen rapid growth in both media atten-tion and market price relative to existing currencies. At its peak, a singleBitcoin traded for more than US$30 on popular Bitcoin exchanges. At thesame time, U.S. Senators and lobby groups in Germany, such as Der Bun-desverband Digitale Wirtschaft (BVWD) or the Federal Association of DigitalEconomy, have raised concerns regarding the untraceability of Bitcoins andtheir potential to harm society through tax evasion, money laundering andillegal transactions. The implications of the decentralized nature of Bitcoinfor authorities’ ability to regulate and monitor the flow of currency is as yetunclear.
Many users adopt Bitcoin for political and philosophical reasons, as muchas pragmatic ones. There is an understanding amongst Bitcoin’s more tech-nical users that anonymity is not a promenient design goal of the system;however, opinions vary widely as to how anonymous the system is, in prac-tice. Jeff Garzik, a member of Bitcoin’s development team, is quoted as sayingit would be unwise “to attempt major illicit transactions with Bitcoin, givenexisting statistical analysis techniques deployed in the field by law enforce-ment”1; however, prior to this work, no analysis of anonymity in Bitcoin waspublicly available to substantiate or refute these claims. Furthermore, manyother users of the system do not share this belief. For example, WikiLeaks,an international organization for anonymous whistleblowers, recently advisedits Twitter followers that it now accepts anonymous donations via Bitcoin(see Fig. 1.1) and states that2:
“Bitcoin is a secure and anonymous digital currency. Bitcoins cannot be easily
tracked back to you, and are a [sic] safer and faster alternative to other donation
methods.”
They proceed to describe a more secure method of donating Bitcoins thatinvolves the generation of a one-time public-key but the implications for thosewho donate using the tweeted public-key are unclear. Is it possible to associatea donation with other Bitcoin transactions performed by the same user orperhaps identify them using external information? The extent to which thisanonymity holds in the face of determined analysis remains to be tested.
This chapter is organized as follows. In Sect. 1.2 we consider some ex-isting work relating to electronic currencies and anonymity. The economicaspects of the system, interesting in their own right, are beyond the scopeof this work. In Sect. 1.3 we present an overview of the Bitcoin system;we focus on three features that are particularly relevant to our analysis. In
1 http://www.theatlantic.com/technology/archive/2011/06/libertarian-dream-a-site-
where-you-buy-drugs-with-digital-dollars/239776 – Retrieved 2011-11-122 http://wikileaks.org/support.html – Retrieved: 2011-07-22
1.1 Introduction 3
Fig. 1.1: Screen capture of a tweet from WikiLeaks announcing their acceptance of ‘anony-
mous Bitcoin donations’.
Sect. 1.4 we construct two network structures, the transaction network andthe user network using the publicly available transaction history. We studythe static and dynamic properties of these networks. In Sect. 1.5 we considerthe implications of these network structures for anonymity. We also combineinformation external to the Bitcoin system with techniques such as flow andtemporal analysis to illustrate how various types of information leakage cancontribute to the de-anonymization of the system’s users. Finally, we concludein Sect. 1.6.
1.1.1 A Note Regarding Motivation and Disclosure
Our motivation for this analysis is not to de-anonymize individual users ofthe Bitcoin system. Rather, it is to demonstrate, using a passive analysisof a publicly available dataset, the inherent limits of anonymity when usingBitcoin. This will ensure that users do not have expectations that are notbeing fulfilled by the system.
In security-related research, there is considerable tension over how bestto disclose vulnerabilities [9]. Many researchers favor full disclosure whereall information regarding a vulnerability is promptly released. This enablesinformed users to promptly take defensive measures. Other researchers fa-vor limited disclosure; while this provides attackers with a window in whichto exploit uninformed users, a mitigation strategy can be prepared and im-plemented before public announcement, thus limiting damage, e.g. through asoftware update. Our analysis illustrates some potential risks and pitfalls withregard to anonymity in the Bitcoin system. However, there is no central au-thority which can fundamentally change the system’s behavior. Furthermore,it is not possible to mitigate analysis of the existing transaction history.
There are also two noteworthy features of the dataset when compared with,say, contentious social network datasets, e.g. the Facebook profiles of HarvardUniversity students [19]. Firstly, the delineation between what is consideredpublic and private is clear: the entire history of Bitcoin transactions is publiclyavailable. Secondly, the Bitcoin system does not have a usage policy. After
4 Reid and Harrigan
joining Bitcoin’s peer-to-peer network, a client can freely request the entirehistory of Bitcoin transactions; there is no crawling or scraping required.
Thus, we believe the best strategy to minimise the threat to user anonymityis to be descriptive about the risks of the Bitcoin system. We do not identifyindividual users – apart from those in the case study – but we note that itis not difficult for other groups to replicate our work. Indeed, given the pas-sive nature of the analysis, other parties may already be conducting similaranalyses.
1.2 Related Work
The related work for this chapter can be categorized into two fields: electroniccurrencies and anonymity.
1.2.1 Electronic Currencies
Electronic currencies can be technically classified according to their mech-anisms for establishing ownership, protecting against double-spending, en-suring anonymity and/or privacy, and generating and issuing new currency.Bitcoin is particularly noteworthy for the last of these mechanisms. Theproof-of-work system [13, 5] that establishes consensus regarding the his-tory of transactions also doubles as a minting mechanism. The scheme wasfirst outlined in the B-Money Proposal [12]. We briefly consider some alterna-tive mechanisms. Ripple [14] is an electronic currency where every user canissue currency. However, the currency is only accepted by peers who trustthe issuer. Transactions between arbitrary pairs of users require chains oftrusted intermediaries between the users. Saito [25] formalized and imple-mented a similar system, i-WAT, in which the the chain of intermediariescan be established without their immediate presence using digital signatures.KARMA [29] is an electronic currency where the central authority is dis-tributed over a set of users that are involved in all transactions. PPay [30] isa micropayment scheme for peer-to-peer systems where the issuer of the cur-rency is responsible for keeping track of it. However, both KARMA and PPaymay incur a large overhead when the rate of transactions is high. Mondexis a smart-card electronic currency [27]. It preserves a central bank’s role inthe generation and issuance of electronic currency. Mondex was an electronicreplacement for cash in the physical world whereas Bitcoin is an electronicanalog of cash in the online world.
The authors are not aware of any studies of the network structure of elec-tronic currencies. However, there are such studies of physical currencies. Thecommunity currency Tomamae-cho was introduced into the Hokkaido Prefec-
1.2 Related Work 5
ture in Japan for a three-month period during 2004–05 in a bid to revitalizelocal economy. The Tomamae-cho system involved gift-certificates that werere-usable and legally redeemable into yen. There was an entry space on thereverse of each certificate for recipients to record transaction dates, theirnames and addresses, and the purposes of use, up to a maximum of five re-cipients. Kichiji and Nishibe [17] used the collected certificates to derive anetwork structure that represented the flow of currency during the period.They showed that the cumulative degree distribution of the network obeyeda power-law distribution, the network had small-world properties (the aver-age clustering coefficient was high whereas the average path length was low),the directionality and the value of transactions were significant features, andthe double-triangle system [23] was effective. There also exist studies of thephysical movement of currency: ‘Where’s George?’ [1] is a crowd-sourcedmethod for tracking U.S. dollar bills where users record the serial numbers ofbills in their possession, along with their current location. If a bill is recordedsufficiently often, its geographical movement can be tracked over time. Brock-mann et al. [8] used this dataset as a proxy for studying multi-scale humanmobility and as a tool for computing geographic borders inherent to humanmobility.
Grinberg [2] considers some of the legal issues that may be relevant toBitcoin in the United States. For example, does Bitcoin violate the StampPayments Act of 1862? The currency can be used as a token for “a less sumthan $1, intended to circulate as money or to be received or used in lieu oflawful money of the United States”. However, the authors of the act could nothave conceived of digital currencies at the time of its writing and thereforeBitcoin may not fall under its scope. Grinberg believes that Bitcoin is unlikelyto be a security or more specifically an “investment contract” and thereforedoes not fall under the Securities Act of 1933. He also believes that the BankSecrecy Act of 1970 and the Money Laundering Control Act of 1986 posethe greatest risk for Bitcoin developers, exchanges, wallet providers, miningpool operators and businesses that accept Bitcoins. These acts require certainkinds of financial businesses, even if they are located abroad, to register witha bureau of the United States Department of the Treasury known as theFinancial Crimes Enforcement Network (or FinCEN). The legality of Bitcoinis outside the scope of our work but is interesting nonetheless.
1.2.2 Anonymity
Previous work has shown the difficulty in maintaining anonymity in the con-text of networked data and online services which expose partial user informa-tion. Narayanan and Shmatikov [22] and Backstrom et al. [6] consider privacyattacks which identify users using the structure of networks and show the dif-ficulty in guaranteeing anonymity in the presence of network data. Crandall
6 Reid and Harrigan
et al. [11] infer social ties between users where none are explicitly stated bylooking at patterns of ‘co-incidences’ or common off-network co-occurrences.Gross and Acquisiti [15] discuss privacy of early users in the Facebook so-cial network, and how information from multiple sources could be combinedto identify pseudonymous network users. Narayanan and Shmatikov [21] de-anonymized the Netflix Prize dataset using information from IMDB3 whichhad similar user content, showing that statistical matching between differentbut related datasets can be used to attack anonymity. Puzis et al. [24] simu-lated the monitoring of a communications network using strategically-locatedmonitoring nodes and showed that, using real-world network topologies, a rel-atively small number of nodes can collaborate to pose a significant threat toanonymity. Korolova et al. [18] study strategies for efficiently compromisingnetwork nodes, to maximise link information observed. Altshuler et al. [3] dis-cuss the increasing dangers of attacks targeting similar types of information,and provide measures of the difficulty of such attacks, on particular networks.All of this work points to the difficulty in maintaining anonymity where net-work data on user behaviour is available and illustrates how seemingly minorinformation leakages can be aggregated to pose significant risks. The securityresearcher Dan Kaminsky independently performed an investigation of someaspects of anonymity in the Bitcoin system, which he presented at a securityconference [16] shortly after an initial draft of this work was made public. Hiswork investigates the ‘linking problem’ we analyze in Sect. 1.4.2. In additionto the analysis we conducted, his work investigates the Bitcoin system froman angle we did not consider in our investigation – the TCP/IP operationof the underlying peer-to-peer network. Kaminsky’s TCP/IP layer findingsstrengthen the core claims of this work that Bitcoin does not anonymise useractivity. We provide a summary of his findings in Sect. 1.5.2.
1.3 The Bitcoin System
The following is a simplified description of the Bitcoin system; see Nakamoto[20] for a more thorough treatment. Bitcoin is an electronic currency withno central authority or issuer. There is no central bank or fractional reservesystem controlling the supply of Bitcoins. Instead, they are generated at apredictable rate such that the eventual total number will be 21 million. Thereis no requirement for a trusted third-party when making transactions. Sup-pose Alice wishes to ‘send’ a number of Bitcoins to Bob. Alice uses a Bitcoinclient to join the Bitcoin peer-to-peer network and makes a public trans-action or declaration stating that one or more identities that she controls(which can be verified using public-key cryptography), and which previouslyhad a number of Bitcoins assigned to them, wish to re-assign those Bitcoins
3 http://www.imdb.com
1.3 The Bitcoin System 7
to one or more other identities, at least one of which is controlled by Bob. Theparticipants of the peer-to-peer network form a collective consensus regard-ing the validity of this transaction by appending it to the public history ofpreviously agreed-upon transactions (the block-chain). This process involvesthe repeated computation of a cryptographic hash function so that the digestof the transaction, along with other pending transactions, and an arbitrarynonce, has a specific form. This process is designed to require considerablecomputational effort, from which the security of the Bitcoin mechanism isderived. To encourage users to pay this computational cost, the process isincentivized using newly generated Bitcoins and/or transaction fees, and sothis whole process is known as mining.
In this chapter, there are three features of the Bitcoin system that areof particular interest. Firstly, the entire history of Bitcoin transactions ispublicly available. This is necessary in order to validate transactions andprevent double-spending in the absence of a central authority. The only wayto confirm the absence of a previous transaction is to be aware of all previoustransactions. The second feature of interest is that a transaction can havemultiple inputs and multiple outputs. An input to a transaction is either theoutput of a previous transaction or a sum of newly generated Bitcoins andtransaction fees. A transaction frequently has either a single input from aprevious larger transaction or multiple inputs from previous smaller transac-tions. Also, a transaction frequently has two outputs: one sending paymentand one returning change. Thirdly, the payer and payee(s) of a transactionare identified through public-keys from public-private key-pairs. However, auser can have multiple public-keys. In fact, it is considered good practice fora payee to generate a new public-private key-pair for every transaction. Fur-thermore, a user can take the following steps to better protect their identity:they can avoid revealing any identifying information in connection with theirpublic-keys; they can repeatedly send varying fractions of their Bitcoins tothemselves using multiple (newly generated) public-keys; and/or they canuse a trusted third-party mixer or laundry. However, these practices are notuniversally applied.
The three features above, namely the public availability of Bitcoin trans-actions, the input-output relationship between transactions and the re-useand co-use of public-keys, provide a basis for two distinct network structures:the transaction network and the user network. The transaction network rep-resents the flow of Bitcoins between transactions over time. Each vertex rep-resents a transaction and each directed edge between a source and a targetrepresents an output of the transaction corresponding to the source that isan input to the transaction corresponding to the target. Each directed edgealso includes a value in Bitcoins and a timestamp. The user network repre-sents the flow of Bitcoins between users over time. Each vertex representsa user and each directed edge between a source and a target represents aninput-output pair of a single transaction where the input’s public-key belongsto the user corresponding to the source and the output’s public-key belongs
8 Reid and Harrigan
to the user corresponding to the target. Each directed edge also includes avalue in Bitcoins and a timestamp.
We gathered the entire history of Bitcoin transactions from the first trans-action on the 3rd January 2009 up to and including the last transactionthat occurred on the 12th July 2011. We gathered the dataset using theBitcoin client4 and a modified version of Gavin Andresen’s bitcointools
project.5 The dataset comprises 1 019 486 transactions between 1 253 054unique public-keys. We describe the construction of the corresponding trans-action and user networks and their analyses in the following sections. Wewill show that the two networks are complex, have a non-trivial topologi-cal structure, provide complementary views of the Bitcoin system and haveimplications for the anonymity of users.
1.4 The Transaction and User Networks
1.4.1 The Transaction Network
The transaction network T represents the flow of Bitcoins between trans-actions over time. Each vertex represents a transaction and each directededge between a source and a target represents an output of the transactioncorresponding to the source that is an input to the transaction correspond-ing to the target. Each directed edge also includes a value in Bitcoins and atimestamp. It is a straight-forward task to construct T from our dataset.
1.2 BTC
01/05/2011 14:13:26
... t4 has 12 other
inputs not shown here
1.32 BTC14:10:54 05/05/2011
0.12 BTC13:12:19 05/05/2011
t1
t2
t3 t4
Fig. 1.2: An example sub-network from the transaction network. Each rectangular vertex
represents a transaction and each directed edge represents a flow of Bitcoins from an outputof one transaction to an input of another.4 http://www.bitcoin.org5 http://github.com/gavinandresen/bitcointools
1.4 The Transaction and User Networks 9
Figure 1.2 shows an example sub-network of T . t1 is a transaction withone input and two outputs.6 It was added to the block-chain on the 1st May2011. One of its outputs assigned 1.2 BTC (Bitcoins) to a user identified bythe public-key pk1.7 The public-keys are not shown in Fig. 1.2. Similarly, t2is a transaction with two inputs and two outputs.8 It was accepted on the 5th
May 2011. One of its outputs sent 0.12 BTC to a user identified by a differentpublic-key, pk2.9 t3 is a transaction with two inputs and one output.10 It wasaccepted on the 5th May 2011. Both of its inputs are connected to the twoaforementioned outputs of t1 and t2. The only output of t3 was redeemed byt4.11
T has 974 520 vertices and 1 558 854 directed edges. The number of verticesis less than the total number of transactions in the dataset because we omittransactions that are not connected to at least one other transaction. Thesecorrespond to newly generated Bitcoins and transactions fees that are notyet redeemed. The network has neither multi-edges (multiple edges betweenthe same pair of vertices in the same direction) nor loops. It is a directedacyclic graph (DAG) since the output of a transaction can never be an input(either directly or indirectly) to the same transaction.
Figure 1.3(a) shows a log-log plot of the cumulative degree distributions:the solid red curve is the cumulative degree distribution (in- and out-degree);the dashed green curve is the cumulative in-degree distribution; and the dot-ted blue curve is the cumulative out-degree distribution. We fitted power-lawdistributions, p(x) ∼ x−α for x > xmin, to the three distributions by es-timating the parameters α and xmin using a goodness-of-fit method [10].Table 1.1 shows the estimates along with the corresponding Kolmogorov–Smirnov goodness-of-fit (GoF) statistics and p-values. We observe that noneof the distributions for which the empirically-best scaling region is non-trivialhave a power-law as a plausible hypothesis (p > 0.1). This is likely due tothe fact that there is no preferential attachment [26, 7]: new vertices arejoined to existing vertices whose corresponding transactions are not yet fullyredeemed.
There are 1 949 (maximal weakly) connected components in the network.Fig. 1.3(b) shows a log-log plot of the cumulative component size distribution.There are 948 287 vertices (97.31%) in the giant component. This component
6 The transactions and public-keys used in our examples ex-
ist in our dataset. The unique identifier for the transaction t1 is09441d3c52fa0018365fcd2949925182f6307322138773d52c201f5cc2bb5976. You can query
the details of a transaction or public-key by examining Bitcoin’s block-chain using, say,the Bitcoin Block Explorer (http://www.blockexplorer.com).7 13eBhR3oHFD5wkE4oGtrLdbdi2PvK3ijMC8 0c4d41d0f5d2aff14d449daa550c7d9b0eaaf35d81ee5e6e77f8948b14d623789 19smBSUoRGmbH13vif1Nu17S63Tnmg7h9n10 0c034fb964257ecbf4eb953e2362e165dea9c1d008032bc9ece5cebbc7cd469711 f16ece066f6e4cf92d9a72eb1359d8401602a23990990cb84498cdbb93026402
10 Reid and Harrigan
(a) (b)
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Edge Number of Transaction Network
0e+00
1e+05
2e+05
3e+05
4e+05
5e+05
(c)
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Density of Transaction Network
0.000
0.001
0.002
0.003
0.004
0.005
(d)
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Average Path Length of Transaction Network
0
500
1000
1500
2000
2500
3000
(e)
Fig. 1.3: For the transaction network: (a) A log-log plot of the cumulative degree distri-
butions. (b) A log-log plot of the cumulative component size distribution. (c) A temporalhistogram showing the number of edges per month. (d) A temporal histogram showing the
density per month. (e) A temporal histogram showing the average path length per month.
also contains a giant biconnected component with 716 354 vertices (75.54%of the vertices in the giant component).
Variable x x s α xmin GoF p-val.
Degree 3 3.20 6.20 3.24 50 0.02 0.05
In-Degree 1 1.60 5.31 2.50 4 0.01 0.00
Out-Degree 1 1.60 3.17 3.50 51 0.05 0.00
Table 1.1: The degree, in-degree and out-degree distributions of T .
1.4 The Transaction and User Networks 11
We also performed a rudimentary dynamic analysis of the network. Fig-ures 1.3(c), 1.3(d) and 1.3(e) show the edge number, density and average pathlength of the transaction network on a monthly basis. These measurementsare not cumulative. The network’s growth and sparsification are evident. Wealso observe some anomalies in the average path length during July andNovember 2010.
1.4.2 The User Network
The user network U represents the flow of Bitcoins between users over time.Each vertex represents a user and each directed edge between a source anda target represents an input-output pair of a single transaction where theinput’s public-key belongs to the user corresponding to the source and theoutput’s public-key belongs to the user corresponding to the target. Eachdirected edge also includes a value in Bitcoins and a timestamp.
We need to perform a preprocessing step before we can construct U fromour dataset. Suppose U is, at first, incomplete in the sense that each ver-tex represents a single public-key rather than a user and that each directededge between a source and a target represents an input-output pair of a sin-gle transaction, where the input’s public-key corresponds to the source andthe output’s public-key corresponds to the target. In order to perfect thisnetwork, we need to contract each subset of vertices whose correspondingpublic-keys belong to a single user. The difficulty is that public-keys are Bit-coin’s mechanism for ensuring anonymity: ‘the public can see that someone[identified by a public-key] is sending an amount to someone else [identified byanother public-key], but without information linking the transaction to any-one.’ [20]. In fact, it is considered good practice for a payee to generate a newpublic-private key-pair for every transaction to keep transactions from beinglinked to a common owner. Therefore, it is impossible to completely perfectthe network using our dataset alone. However, as noted by Nakamoto [20],
“Some linking is still unavoidable with multi-input transactions, which necessarily
reveal that their inputs were owned by the same owner. The risk is that if the ownerof a key is revealed, linking could reveal other transactions that belonged to the
same owner.”
We will use this property of transactions with multiple inputs to contractsubsets of vertices in the incomplete network. We construct an ancillary net-work in which each vertex represents a public-key. We connect pairs of verticeswith undirected edges, where each edge joins a pair of public keys that areboth inputs to the same transaction and are thus controlled by the sameuser. From our dataset, this ancillary network has 1 253 054 vertices (uniquepublic-keys) and 4 929 950 edges. More importantly, it has 86 641 non-trivialmaximal connected components. Each maximal connected component in this
12 Reid and Harrigan
graph corresponds to a user, and each component’s constituent vertices cor-respond to that user’s public-keys.
(a) (b)
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Edge Number of User Network
0e+00
1e+05
2e+05
3e+05
4e+05
5e+05
6e+05
7e+05
(c)
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Density of User Network
0.0
0.1
0.2
0.3
0.4
0.5
0.6
(d)
2009
−01
2009
−02
2009
−03
2009
−04
2009
−05
2009
−06
2009
−07
2009
−08
2009
−09
2009
−10
2009
−11
2009
−12
2010
−01
2010
−02
2010
−03
2010
−04
2010
−05
2010
−06
2010
−07
2010
−08
2010
−09
2010
−10
2010
−11
2010
−12
2011
−01
2011
−02
2011
−03
2011
−04
2011
−05
2011
−06
Average Path Length of User Network
0
5
10
15
(e)
Fig. 1.4: For the user network: (a) A log-log plot of the cumulative degree distributions.
(b) A log-log plot of the cumulative component size distribution. (c) A temporal histogramshowing the number of edges per month. (d) A temporal histogram showing the density
per month. (e) A temporal histogram showing the average path length per month.
Figure 1.5 shows an example sub-network of the incomplete network over-laid onto the example sub-network of T from Fig. 1.2. The outputs of t1 andt2 that were eventually redeemed by t3 were sent to a user whose public-keywas pk1 and a user whose public-key was pk2 respectively. Figure 1.6 showsan example sub-network of the user network overlaid onto the example sub-network of the incomplete network from Fig. 1.5. pk1 and pk2 are contractedinto a single vertex u1 since they correspond to a pair inputs of a single trans-action. In other words, they are in the same maximal connected component of
1.4 The Transaction and User Networks 13
t1
t2
t3 t4pk2
pk1
Fig. 1.5: An example sub-network from the incomplete network. Each diamond vertex
represents a public-key and each directed edge between diamond vertices represents a flow
of Bitcoins from one public-key to another.
pk2
pk1
u1
u2
1.32 BTC
14:10:54
05/05/20
11
pk2pk1
Fig. 1.6: An example sub-network from the user network. Each circular vertex represents
a user and each directed edge between circular vertices represents a flow of Bitcoins fromone user to another. The maximal connected component from the ancillary network that
corresponds to the vertex u1 is shown within the dashed grey box.
the ancillary network (see the vertices representing pk1 and pk2 in the dashedgrey box in Fig. 1.6). A single user owns both public-keys. We note that themaximal connected component in this case is not simply a clique; it has adiameter of four indicating that there are at least two public-keys belongingto that same user that are connected indirectly via three transactions. Thesixteen inputs to transaction t4 result in the contraction of a further sixteenpublic-keys into a single vertex u2. The value and timestamp of the flow ofBitcoins from u1 to u2 is derived from the transaction network.
14 Reid and Harrigan
After the preprocessing step, U has 881 678 vertices (86 641 non-trivialmaximal connected components and 795 037 isolated vertices in the ancillarynetwork) and 1 961 636 directed edges. The network is still incomplete. Wehave not contracted all possible vertices but it will suffice for our presentanalysis. Unlike T , U has multi-edges, loops and directed cycles.
Figure 1.4(a) shows a log-log plot of the network’s cumulative degree dis-tributions. We fitted power-law distributions to the three distributions andcalculated their goodness-of-fit and statistical significance as in the previoussection. Table 1.2 shows the results. We observe that none of the distributionshave a power-law as a plausible hypothesis.
There are 604 (maximal) weakly connected components and 579 355 (max-imal) strongly connected components in the network; Fig. 1.4(b) shows a log-log plot of the cumulative component size distribution for both variations.There are 879 859 vertices (99.79%) in the giant weakly connected compo-nent. This component also contains a giant weakly biconnected componentwith 652 892 vertices (74.20% of the vertices in the giant component).
Variable x x s α xmin GoF p-val.
Degree 3 4.45 218.10 2.38 66 0.02 0.00
In-Degree 1 2.22 86.40 2.45 57 0.05 0.00Out-Degree 2 2.22 183.91 2.03 10 0.22 0.00
Table 1.2: The degree, in-degree and out-degree distributions of U .
Our dynamic analysis of the user network mirrors that of the transactionnetwork in the previous subsection. Figures 1.4(c), 1.4(d) and 1.4(e) showthe edge number, density and average path length of the user network ona monthly basis. These measurements are not cumulative. The network’sgrowth and sparsification are evident. We note that even though our dynamicanalysis of the user network is on a monthly basis, the preprocessing step isperformed using the ancillary network of the entire incomplete network. Thisenables us to resolve public-keys to a single user irrespective of the month inwhich the linking transactions occur.
The contraction of public-keys into users, while incomplete, generates anetwork that is in many ways a proxy for the social network of Bitcoin users.The edges represent financial transactions between pairs of users. It may bepossible to identify, for example, communities, central users and hoarderswithin this social network.
1.5 Anonymity Analysis 15
1.5 Anonymity Analysis
Prior to performing the analyses above, we expected the user network to belargely composed of trees representing Bitcoin flows between one-time public-keys that were not linked with other public-keys. However, our analyses revealthat the user network has considerable cyclic structure. We now considerthe implications of this structure, coupled with other aspects of the Bitcoinsystem, for anonymity.
There are several ways in which the user network can be used to deduceinformation about Bitcoin users. We can use global network properties, suchas degree distribution, to identify outliers. We can use local network proper-ties to examine the context in which a user operates by observing the userswith which he or she interacts with either directly or indirectly. The dynamicnature of the user network also enables us to perform flow and temporal anal-yses. We can examine the significant Bitcoin flows between groups of usersover time. We will now discuss each of these possibilities in more detail andprovide a case study to demonstrate their use in practice.
1.5.1 Integrating Off-Network Information
There is no user directory for the Bitcoin system. However, we can attemptto build a partial user directory associating Bitcoin users (and their knownpublic-keys) with off-network information. If we can make sufficient associ-ations and combine them with the network structures above, a potentiallyserious threat to anonymity emerges.
Many organizations and services such as on-line stores that accept Bit-coins, exchanges, laundry services and mixers have access to identifying in-formation regarding their users, e.g. e-mail addresses, shipping addresses,credit card and bank account details, IP addresses, etc. If any of this infor-mation is publicly available, or accessible by, say, law enforcement agencies,then the identities of users involved in related transactions may also be atrisk. To illustrate this point, we consider a number of publicly available datasources and integrate their information with the user network.
1.5.1.1 The Bitcoin Faucet
The Bitcoin Faucet12 is a website where users can donate Bitcoins to beredistributed in small amounts to other users. In order to prevent abuse ofthis service, a history of recent give-aways are published along with the IPaddresses of the recipients. When the Bitcoin Faucet does not batch the re-
12 http://freebitcoins.appspot.com
16 Reid and Harrigan
distribution, it is possible to associate the IP addresses with the recipient’spublic-keys. This page can be scraped over time to produce a time-stampedmapping of IP addresses to users.
We found that the public-keys associated with many of the IP addressesthat received Bitcoins were contracted with other public-keys in the ancillarynetwork, thus revealing IP addresses that are somehow related to previoustransactions. Fig. 1.7(a) shows a map of geolocated IP addresses belonging tousers who received Bitcoins over a period of one week. Fig. 1.7(b) overlays theuser network onto a sample of those users. An edge between two geolocated IPaddresses indicates that the corresponding users are linked by an undirectedpath of length at most three in the user network; the path must not containthe vertex representing the Bitcoin Faucet itself.
These figures serve as a proof-of-concept from a small publicly availabledata source. We note that large centralized Bitcoin service providers arecapable of producing much more detailed maps.
(a) (b)
Fig. 1.7: We can use the Bitcoin Faucet to map users to geolocated IP addresses. (a) Amap of geolocated IP addresses associated with users receiving Bitcoins from the Bitcoin
Faucet during a one week period. (b) A map of a sample of the geolocated IP addresses in(a) connected by edges where the corresponding users are connected by a path of length at
most three in the user network that does not include the vertex representing the Bitcoin
Faucet.
1.5.1.2 Voluntary Disclosures
Another source of identifying information is the voluntary disclosure ofpublic-keys by users, for example, when posting to the Bitcoin forums13.Bitcoin public-keys are typically represented as strings approximately thirty-three characters in length and starting with the digit one. They are indexedvery well by popular search engines. We identified many high-degree verticeswith external information using a search engine alone. We scraped the Bit-coin Forums where users frequently attach a public-key to their signatures.
13 http://forum.bitcoin.org
1.5 Anonymity Analysis 17
We also gathered public-keys from Twitter streams and user-generated publicdirectories. It is important to note that in many cases we are able to resolvethe ‘public’ public-keys with other public-keys belonging to the same user us-ing the ancillary network. We also note that large centralized Bitcoin serviceproviders can do the same with their user information.
1.5.2 TCP/IP Layer Information
Security researcher Dan Kaminsky has performed an analysis of the Bitcoinsystem, investigating identity leakage at the TCP/IP layer. He found that byopening a connection to all public peers in the network at once, he could mapIP addresses to Bitcoin public-keys, working from the assumption that “thefirst node to inform you of a transaction is the source of it. . . [this is] more orless true, and absolutely over time” [16]. Using this approach it is possible tomap public-keys to IP addresses unless users are using an anonymising proxytechnology such as TOR.
1.5.3 Egocentric Analysis and Visualization
There are severals pieces of information we can directly derive from the usernetwork regarding a particular user. We can compute the balance held bya single public-key. We can also aggregate the balances belonging to public-keys that are controlled by a particular user. For example, Fig. 1.8(a) andFig. 1.8(b) show the receipts and payments to and from WikiLeaks’ public-key in terms of Bitcoins and the number of transactions respectively. Thedonations are relatively small and are forwarded to other public-keys period-ically. There was also a noticeable spike in donations when the facility wasfirst announced. Figure 1.8(c) shows the receipts and payments to and fromthe creator of a popular Bitcoin trading website aggregated over a numberof public-keys that are linked through the ancillary network.
An important advantage of deriving network structures from the Bitcointransaction history is our ability to use network visualization and analysistools to investigate the flow of Bitcoins. For example, Fig. 1.9 shows the net-work structure surrounding the WikiLeaks’ public-key in the incomplete usernetwork. Our tools resolve several of the vertices with identifying informationgathered in Sect. 1.5.1. These users can be linked either directly or indirectlyto their donations.
18 Reid and Harrigan
0 5 10 15 20 25 30Day
0
50
100
150
200
250
300
350
400
450
Bitc
oins
Cumulative IncomingCumulative Outgoing
(a) The receipts and payments to and
from WikiLeaks’ public-key over time.
0 5 10 15 20 25 30Day
0
10
20
30
40
50
60
70
Tran
sact
ions
Outgoing TransactionsIncoming Transactions
(b) The number of transactions involv-
ing WikiLeaks’ public-key over time.
0 50 100 150 200 250 300 350Day
0
2000
4000
6000
8000
10000
12000
14000
16000
Bitc
oins
Cumulative IncomingCumulative Outgoing
(c) The receipts and payments to andfrom the creator of a popular Bitcoin
trading website aggregated over a num-
ber of public-keys.
Fig. 1.8: We can plot cumulative receipts and payments to and from Bitcoin public-keys
and users.
1.5.4 Context Discovery
Given a number of public-keys or users of interest, we can use network struc-ture and context to better understand the flow of Bitcoins between them.For example, we can examine all shortest paths between a set of vertices orconsider the maximum number of Bitcoins that can flow from a source toa destination given the transactions and their ‘capacities’ in an interestingtime-window. For example, Fig. 1.10 shows all shortest paths between thevertices representing the users we identified using off-network information inSect. 1.5.1 and the vertex that represents the MyBitcoin service14 in the usernetwork. We can identify more than 60% of the users in this visualizationand deduce many direct and indirect relationships between them.
14 http://www.mybitcoin.com
1.5 Anonymity Analysis 19
864768
9264
4096
6658
472235
80470
68102
18
411137
780297
140810
384012
587522
4623
23
751773
611795
192529
3726
334520
828650
146229
503227
18237
372
2434
928
187
880147
24634
233606
24215
301221
5821
5329
339
5992
751500
11767
508
359310
628760
684569
39855
466972
509470
693791
356385
297506
154569
267814
189486
70191
832671
459403
86175
290020
376702
283026
447905
737204
344528
921429
57908
436
145418
514111
58434
8259
12356
710582
704070
656967
146510
704591
863825
464995
52821
369238
527959
16987
343644
529502
615521
12387
13412
33893
644200
54890
710253
399470
395889
18036
688245
222134
164472
520832
132738
37825
17032
18576
826297
13458
244372
438264
529050
7708
578218
117419
438956
64686
175
18610
81093
1239
398555
216369
655670
50491
45400
32117
209391
491077
10675
406546
35359
40117
19638
18377
745503
140481
873847
494200
234195
761560
765657
8410
157403
623946
60643
889553
860392
500971
15598
643312
503026
1779
136948
564989
284930
1795
645446
184582
182487
852749
386325
594199
272431
167130
402500
253733
526638
720687
317235
518708
34621
341813
353253
675003
102022
45905
359250
305493
226134
84823638885
364379
5982
914405
171360
15718
681322
77287
622956
447856
235752
463223
448376
28025
152443
8687
214400
18821
797574
803208
59275
503694
782736
631191
43076
237981
864158
124322
534086
502183
431016
74665
14764
522159
829366
750009
660923
304061
88480
402887
215502
35281
385440
50644
364811
400863
8673
36325
56401
373736
713708
82770
890863
101679
141395
106374
614392
55802
822782
Fig. 1.9: An egocentric visualization of the vertex representing WikiLeaks’ public-key in
the incomplete user network. The size of a vertex corresponds to its degree in the entireincomplete user network. The color denotes the volume of Bitcoins – warmer colors have
larger volumes flowing through them. The three largest red vertices represent a Bitcoin
mining pool, a centralized Bitcoin wallet service, and an unknown entity.
Case Study – Part I: We analyse an alleged theft of 25 000 BTC reportedin the Bitcoin Forums15 by a user known as allinvain. The victim reportedthat a large portion of his Bitcoins were sent to pkred
16 on 13/06/2011 at16:52:23 UTC. The theft occurred shortly after somebody broke into thevictim’s Slush pool account17 and changed the payout address to pkblue.
18.The Bitcoins rightfully belonged to pkgreen.19 At the time of theft, the stolenBitcoins had a market value of approximately half a million U.S. dollars. Wechose this case study to illustrate the potential risks to the anonymity of auser (the thief) who has good reason to remain anonymous.
We consider the incomplete user network before any contractions. We re-strict ourselves to the egocentric network surrounding the thief: we includeevery vertex that is reachable by a path of length at most two ignoring direc-tionality and all edges induced by these vertices. We also remove all loops,multiple edges and edges that are not contained in some biconnected com-ponent to avoid clutter. In Fig. 1.11, the red vertex represents the thief who
15 http://forum.bitcoin.org/index.php?topic=16457.016 1KPTdMb6p7H3YCwsyFqrEmKGmsHqe1Q3jg17 http://mining.bitcoin.cz18 15iUDqk6nLmav3B1xUHPQivDpfMruVsu9f19 1J18yk7D353z3gRVcdbS7PV5Q8h5w6oWWG
20 Reid and Harrigan
402695
245397
23
555009
33189
203087
2
4119
52239
1045
2075
3997
1322
436
6460
8129
3794
4182
5082
5600
28006
11375
113
18674
118
2168
766
371
180
57
97
348
2909
988
38
5752
16
18
25332
191
10127
26257
23828
293
3013128
187
701
20561
59
3571
3726
928
9264
1228
77
2015
100
16614
11624
5493
2685
743443
27652
177159
35337
1538
23954
73231
9745
853524
571417
6174
33
1058
4131
548
2599
7209
594986
3117
15918
309814
13833
4673
8771
18503
74
39503
5200
51289
361
55397
41575
60009
75371
1554
43630
68209
740978
3700
1143
143481
905493
72320
786568
702604
41101
19607
1690
17054
36511
2208
24737
12453
5286
100008
3761
50866
348702
12986
15548
415933
1728
1995
6774
10273
19357
16247
11470
6257
1572
5329
1256
12009
24811
880364
547054
268413
641264
587506
12533
35578
471291
9471
7552
67844
75526
59144
13579
452366
51989
282
802
22820
817448
77098
69420
417069
863022
7716
5425
5432
2363
82748
394
9028
12101
2551
4938
23371
86348
333
14477
339
1081
233
50527
12640
1889
2944
10595
24423
70507
59244
14189
32622
17264
788201
2943
8064
4075
56871
7562
398
31121
27538
34883
8981
18733
17135
776357
165789
537840
13230
57775
10160
2483
16310
6135
2488
4537
681914
832955
31213
77642
75205
1991
3528
7627
29953
240079
5072
1492
55775
137296
3043
61017
6631
574441
352235
10870
5165
13809
4084
4597
9207
38905
428028
142916
9349
10058
726209
3740
3696
400076
34350
62669
780943
125
659519
526481
5519
205087
12727
47204
21551
108547
675844
776197
764486
4107
12549
3975
85262
18582
16030
603815
5034
4523
46878
6199
804930
2889
25810
2136
28762
3679
344045
14960
13558
1656
8
1665
10370
3588
19600
9377
24746
50987
143916
2733
27698
2107
4170
228687
5969
1235
11478
196708
31194
6497
28007
207914
19346
6650
42668
157225
5420
48715
10213
165900
157711
53821
922346
46254
12617
321
6327
33750
13035
2102
446786
24371
2226
1743
9900
1548
32339
18302
5581
41235
113055
924600
28427
11862
16987
1502
1759
43490
70552319956
477919
2084
32264
22302
815
1460
12218
1088
836
77384
3435
20718
3832
548985
3419
3444
21407
6545
31510
30887
4958
56423
7761
1510
1953
30132
28658
56
29093
33064
80780
17882
74622
829777
19867
8258
2449
45764
4409
11207
20856
76869
610616
4921
40073
774465
26114
17087
243840
3140
3708
86286
53519
32332
422555
77217669479
681136
630
146630
6689
2634
151506
21722
154342
375783
18797
137484
20818
21344
1584
631012
14693
8431
864081
817525
7246
92118
17851
17062
37501
46902
768578
574254
17891
13408
56512
38107
7426
578
22903
76147
188966
3338
33091
470041
52781
78527
49744
54742
7943
5047
232714
370000
8480
629510
859825
11959
175
7005
2093
49264
16983
446018
39775
21980
69945
3766
592542
228328
384648
562840
698225
149333
28067
11017
48235
817852
590054
586474
449841
12923
492807
5303
2660
666098
27157
845950
11173
70414
23728
457294
9943
18193
192813
171083
2687
20424
5727
786604
57918
46845
22355
651901
584122
38383
4028
14454
19345
868
155315
323306
27771
9532
155766
45080
19506
30322
674430
7253
71711
139329
4595
18547
739471
170
50887
70146
768003
10756
598
3598
430
541
67124
7738
5182
63
1096
762953
19026
7771
52831
58987
35439
128121
2173
27783
7020
1678
15504
19089
40595
346775
507548
3751
43180
23735
256630
171732
48846
34522
6877
642786
9955
7911
37611
236
69842
39616
304372
3326
30975
9232
233357
1295
10001
1523
18610
12074
69931
9006
71984
750387
67390
46911
20293
2887
31567
1360
41810
143191
69976
387932
18782
196240
14178
827
22673
15721
24466
33134
28015
109927
36344
55693
561039
11162
24988
105892
425
527277
6062
7240
23994
722335
446
452
202181
67440
10184
523721
7628
2125
4856
8591
384677
7138
71658
48807
254
853496
130134
27610
248674
686109
612382
36622229797
358509
6179
82659
8259
12328
841772
290141
868587
442417
587831
676082
613083
8793
40645
63689
37072
157632
875
710406
17869
820308
42590
167996
395993
723221
8254
55363
368709
829510
59464
120116
630964
24588
131147
556204
569420
452685
418811
741458
791786
205502
69718
2590
16475
121530
561246
12147
23541
475394
645820
254148
43653
925220
318823
209796
109004
468747
536
1160
781698
414
72935
417908
79989
65658
41660
22659
689248
761160
767468
481420
68291
299374
484070
102555
22685
80030
63173
256161
708771
135234
676007
128145
899584
340831
395706
515860
74273
451521
53006
2534
16657
592782
394363
60584
41796
82632
23890
37976
261107
4995
678712
75867
238924
559938
27559
409063
664683
878192
658120
108723
261771
383489
70345
76350
33822
333532
603326149616
204044
94403
167488
252104
755915
10445
83777
538833
20691
137429
65752
8411
5447
322962
30942
534854
512231
4558
480534
39144
238288
18675
163380
879725
161605
84891
751888
494527
123585
233746
86326
178273
307480
218923
760090
276465
407842
780579
321831
451565
751914
84285
192830
8511
22326
16705
236022
471370
39204
103835
70371
231763
31060
141030148943
14393
234138
268644
506433
498794
261350
172499
7913
811394
4572
20867
444808
804926
315792
477588
383383
457114
506869
712312
188833
66689
901540
526416
62982
498092
2462
147238
915894
355805
45504
18885
7075
551329
305610
605709
8655
258790
164308
738141
291286
7418
430560
483809
760741
492
62341
94703
561648
262641
846328
80380
118187
687770
6670
777175
893455
841816
304899
2580
2137
556969
397849
291354
91
510504
31280
55078
539191
55868
109119
852549
521950
880622
856663
64092
922367
800173
340248
78433
140070
344677
43648
78470
823945
21135
84624
914787
699025
23186
619159
317268
45168
25266
277156
64168
578218
374899
398200
64195
187077
838433
572556
255028
759289
626404
249006
82860
708728
37591
199462
416489
509275
418997
60281
353011
101109
541433
436986
525056
18026
389248
264079
2360
275210
11781
2827
396050
559161
281372
396062
117924
148255
6544
105249
659897
567089
803625
879402
62750
402224
274568
525573
656766
35634
54208
220638
537398
696224
80700
468141
826628
582469
170833
262995
341239
804482 764433
569283
411402
803672
109402
622735
29544
566122
859873
46675
31603418679
259428
168826
793468
390571
421555
80769
519042
345119
14010
37947
879501
537487
65771
299933
3425
71493
50081
42653
562083
48109
678831
211991
148404
801717
744806
347075
461765
144948
373718
711640
214751
259161
831565
548165
353103
615388
78820
11243
368699
73519
889340
5972
73280
208725
763538
31749
71509
3083
779284
818988
58394
105502
27683
842115849041
450738
144433
578685
304183
665008
62525
614845
70725
41372
60486
881736
812604
709271
760913
7709
29783
521432
302607
296006
105566
134178
569891
910436
268927
41023
384126
504961
545926
715916
158637
21647
103573
109726
628036
703656
195864
5289
722097
175966
156852
601276
455860
62668
14308
732371
77025
449763
847486
271911
75202
5478
438824
656629
605437
695560
187646
11520
480516
106294
56583
462089
851210
503562
180099
189998
222489
306232
757630
275741
498349
376099
689446
66860
7731
34103
462070
2955
18998
736587
591182
15702
377424
783865
560477
494737
105851
886141
339390
825237
3463
81770
583048
907385
790272
253335
43079
447903
529761
540021
5550
56751
42912
40387
185808
707855
136043
705443
708056
164606
18190
786395
250942
437756
1537
51798
1542
691734
2203912253
339487
43611
280329
785957
474664
530354
3632
22067
82433
54538
587348
153174
69436
77400
136796
608529
632424
460961
151153
27241
706169
722197
4073
601719
439993
91776
126596
140949
568982
165294
82543
737904
842556
159394
28326
573383
120492
32442
751903
522013
192191
52935
920699
145701
326563
79592
624049
759281
243498
871
6443
12036
771030
134817
787415
463491
618492
42783
55981
14937
42787
83752
81713
12109
675668
321365
242907
696152
818683
399204
42854
52369
22430
63122
889541
44910
114632
1913
122760
821117
649790
340634
491397
535873
744283
8855
487316
542069
36761
526235
383979
233429
776108
407469
795227
220492
782283
262093
761821
358383
712012
8251
752129
669581
265961
444415
Fig. 1.10: A visualisation of all users identified in Sect. 1.5.1 and all shortest paths between
the vertices representing those users and the vertex representing the MyBitcoin service in
the user network.
Fig. 1.11: An egocentric visualization of the thief in the incomplete user network. For thisvisualization, vertices are colored according to the text, edges are colored according to the
color of their sources and the size of each vertex is proportional to its edge-betweenness
within the egocentric network.
1.5 Anonymity Analysis 21
owns the public-key pkred and the green vertex represents the victim whoowns the public-key pkgreen. The theft is represented by a green edge joiningthe victim to the thief.
1 BTC17:34:04 13/06/2011
25000 BTC17:52:23 13/06/2011
0.31337 BTC17:45:31 13/06/2011
0.120607 BTC16:55:19 12/06/2011
0.11 BTC04:04:14 22/05/2011
0.09 BTC09:07:59 23/05/2011
60 transactions involving 441.83 BTC over a 70-day period
Thief
Victim
Time
Bitc
oins
Fig. 1.12: An interesting sub-network induced by the thief, the victim and three other
vertices. The notation is the same as in Fig. 1.11.
Interestingly, the victim and the thief are joined by paths (ignoring di-rectionality) other than the green edge representing the theft. For example,consider the sub-network shown in Fig. 1.12 induced by the red, green, pur-ple, yellow and orange vertices. This sub-network is a cycle. We contract allvertices whose corresponding public-keys belong to the same user. This al-lows us to attach values in Bitcoins and timestamps to the directed edges. Wecan make a number of observations. Firstly, we note that the theft of 25 000BTC was preceded by a smaller theft of 1 BTC. This was later reported bythe victim in the Bitcoin forums. Secondly, using off-network data, we haveidentified some of the other colored vertices: the purple vertex representsthe main Slush pool account and the orange vertex represents the computer
22 Reid and Harrigan
hacker group known as LulzSec.20 We note that there has been at least oneattempt to associate the thief with LulzSec21. This was a fake; it was createdafter the theft. However, the identification of the orange vertex with LulzSecis genuine and was established before the theft. We observe that the thiefsent 0.31337 BTC to LulzSec shortly after the theft but we cannot otherwiseassociate him with the group. The main Slush pool account sent a total of441.83 BTC to the victim over a 70-day period. It also sent a total of 0.2BTC to the yellow vertex over a two day period. One day before the theft,the yellow vertex also sent 0.120607 BTC to LulzSec.
The yellow vertex represents a user who is the owner of at least five public-keys.22 Like the victim, he is a member of the Slush pool, and like the thief,he is a one-time donator to LulzSec. This donation, the day before the theft,is his last known activity using these public-keys.
1.5.5 Flow and Temporal Analyses
In addition to visualizing egocentric networks with a fixed radius, we canfollow significant flows of value through the network over time. If a vertexrepresenting a user receives a large volume of Bitcoins relative to their esti-mated balance, and, shortly after, transfers a significant proportion of thoseBitcoins to another user, we deem this interesting. We built a special purposetool that, starting with a chosen vertex or set of vertices, traces significantflows of Bitcoins over time. In practice we have found this tool to be quiterevealing when analyzing the user network.
Case Study – Part II: To demonstrate this tool we re-consider the Bitcointheft described earlier. We note that the victim has developed their owntool to generate an exhaustive list of public-keys that have received someportion of the stolen Bitcoins since the theft.23 However, this list grows veryquickly and, at the time of writing, contained more than 34 100 public-keys.Figure 1.13 shows an annotated visualization produced using our tool. Weobserve several interesting flows in the aftermath of the theft. The initialtheft of a small volume of 1 BTC is immediately followed by the theft of25 000 BTC. This is represented as a dotted black line between the relevantvertices, magnified in the left inset.
20 http://twitter.com/LulzSec/status/7638857683265126521 http://pastebin.com/88nGp50822 1MUpbAY7rjWxvLtUwLkARViqSdzypMgVW4
13tst9ukW294Q7f6zRJr3VmLq6zp1C68EK1DcQvXMD87MaYcFZqHzDZyH3sAv8R5hMZe1AEW9ToWWwKoLFYSsLkPqDyHeS2feDVsVZ
1EWASKF9DLUCgEFqfgrNaHzp3q4oEgjTsF23 http://folk.uio.no/vegardno/allinvain-addresses.txt
1.5 Anonymity Analysis 23
1555008
23
38
35306
33302
51840
15785
46868
766
5174
22345
2168
83677
26206
14343
652298
112654
570264
185593
173148
843876
579825
694287
578938
810331
559124
83545
705069
88134
97665
289393
547864
728964
786485
40371
453627
785946
908544
109875
75322
211570
119835
432124
778645
658824
686108
718222
731679
415163
418447
194930
855203
417317
628267
792796
53054
594130
812614
835745
329167
301616
10289
572864
586163
721972
384811
285412
144498
314155
640488
882742
310331
416828
341813
422259
651852
275022
675125
211899
400979
541199
912716
903287
312557
2003
818275
339830
141809
19441
675457
146889
840296
761325
685007
730355
142970
30331
733417
598910
220492
290553
736402
245910
227589
175023
586588
393368
628889
647334
71852
764650
104741
358574
260814
750257
316873
907453
164555
576725
304592
22746
657716
892158
556457
312442
443268
875756
368532
560596
278942
132412
187689
799054119
891285
199029
612697
263646
715668
861610
3956
507270
603123
32240
112654
570264
185593
843876
586163
18:52
19:21
19:24
19:24
19:38
20:01 19:21
19:59
19:49
19:49
19:59
20:55
20:13
2
A
B
C3
4
694287
578938
810331
83545
705069
88134
785946
908544
109875
75322
211570
686108
341813
422259
312557
358574
560596
891285
A
B
C
31
4
4
06/13 20:55
06/13 22:15
06/14 04:05
06/14 04:05
06/14 04:05
06/14 04:08
06/16 03:42
06/16 13:37
06/16 13:44
06/16 13:44
06/16 13:44
06/16 13:44
06/16 13:37 06/16
03:42
2
12
Y
MyBitcoin
TheftReporter
AllegedThief
AllegedThief
TheftReporter
Fig. 1.13: Visualisation of Bitcoin flow from the alleged theft. The left inset shows theinitial shuffling of Bitcoins among accounts close to that of the alleged thief. The right
inset shows the flow of Bitcoins during several subsequent days. The flows split and ater
merge, validating that the flows found by the tool are probably still controlled by a singleuser.
In the left inset, we can see that the Bitcoins are shuffled between a smallnumber of accounts and then transferred back to the initial account. Afterthis shuffling step, we have identified four significant outflows of Bitcoins thatbegan at 19:49, 20:01, 20:13 and 20:55. Of particular interest are the outflowsthat began at 20:55 (labeled as ‘1’ in both insets) and 20:13 (labeled as ‘2’ inboth insets). These outflows pass through several subsequent accounts overa period of several hours. Flow 1 splits at the vertex labeled A in the rightinset at 04:05 on the day after the theft. Some of its Bitcoins rejoin Flow 2at the vertex labeled B. This new combined flow is labeled as ‘3’ in the rightinset. The remaining Bitcoins from Flow 1 pass through several additionalvertices in the next two days. This flow is labeled as ‘4’ in the right inset.
A surprising event occurs on 16/06/2011 at approximately 13:37. A smallnumber of Bitcoins are transferred from Flow 3 to a heretofore unseen public-key pk1.24 Approximately seven minutes later, a small number of Bitcoinsare transferred from Flow 3 to another heretofore unseen public-key pk2.25
Finally, there are two simultaneous transfers from Flow 4 to two more hereto-fore unseen public-keys: pk3
26 and pk4.27 We have determined that these fourpublic-keys, pk1, pk2, pk3 and pk4 – which receive Bitcoins from two sepa-rate flows that split from each other two days previously – are all contracted
24 1FKFiCYJSFqxT3zkZntHjfU47SvAzauZXN25 1FhYawPhWDvkZCJVBrDfQoo2qC3EuKtb9426 1MJZZmmSrQZ9NzeQt3hYP76oFC5dWAf2nD27 12dJo17jcR78Uk1Ak5wfgyXtciU62MzcEc
24 Reid and Harrigan
to the same user in our ancillary network. This user is represented as C inFig. 1.13.
There are several other examples of interesting flow. The flow labeled as Yinvolves the movement of Bitcoins through thirty unique public-keys in a veryshort period of time. At each step, a small number of Bitcoins (typically 30BTC which had a market value of approximately US$500 at the time of thetransactions) are siphoned off. The public-keys that receive the small numberof Bitcoins are typically represented by small blue vertices due to their lowvolume and degree. On 20/06/2011 at 12:35, each of these public-keys makesa transfer to a public-key operated by the MyBitcoin service.28 Curiously,this public-key was previously involved in another separate Bitcoin theft29.
555008
23
38
35306
33302
51840
15785
46868
766
5174
22345
2168
83677
26206
14343
652298
112654
570264
185593
173148
843876
579825
694287
578938
810331
559124
83545
705069
88134
97665
728964
40371
908544
109875
75322
211570
119835
432124
778645
658824
686108
718222
731679
415163
418447
194930
855203
417317
53054
594130
329167
301616
10289
572864
586163
721972
384811
285412
314155
882742
310331
416828
341813
422259
651852
400979
903287
2003
19441
675457
146889
730355
142970
30331
733417
598910
220492
290553
736402
227589
393368
628889
71852
260814
750257
907453
164555
576725
304592
22746
657716
556457
312442
875756
368532
560596
278942
132412
187689
799054119
891285
715668
861610
3956
507270
603123
32240
Fig. 1.14: The Bitcoins are transferred between public-keys along the highlighted paths
very quickly.
We also observe that the Bitcoins in many of the above flows are trans-ferred between public-keys very quickly. Fig. 1.14 shows two flows in partic-ular where the intermediate parties waited for very few confirmations beforere-sending the Bitcoins to other public-keys.
Much of this analysis is circumstantial. We cannot say for certain whetheror not these flows imply a shared agency in both incidents. However, it doesillustrate the power of our tool when tracing the flow of Bitcoins and gener-ating hypotheses. It also suggests that a centralized service may have furtherdetails on the user(s) in control of the implicated public-keys.
28 1MAazCWMydsQB5ynYXqSGQDjNQMN3HFmEu29 http://forum.bitcoin.org/index.php?topic=20427.0
1.5 Anonymity Analysis 25
1.5.6 Other Forms of Analysis
There are many other forms of analysis that can be applied in order to de-anonymize the workings of the Bitcoin system:
• Many transactions have two outputs: one is the payment from a payer toa payee and the other is the return of change to the payer. If we assumethat a transaction was created using a particular client implementationand we have access to the client’s source code, then we may be able todeduce, in some cases, which was the output and which was the change.We can then map the public-key that the change was assigned to back tothe user who created the transaction.
• Order books for Bitcoin exchanges are typically available to support trad-ing tools. As orders are often placed in Bitcoin values converted fromother currencies, they have a precise decimal value with eight significantdigits. It may be possible to find transactions with corresponding amountsand thus map public-keys and transactions to the exchanges.
• Over an extended time period, several public-keys, if used at similartimes, may belong to the same user. It may be possible to constructand cluster a co-occurrence network to help deduce mappings betweenpublic-keys and users.
• Finally, there are far more sophisticated forms of attack where the at-tacker actively participates in the network, for example, using markedBitcoins or by operating a laundry service.
1.5.7 Mitigation Strategies
In addition to educating users about the limits of anonymity in the Bit-coin system, some risks to privacy could potentially be mitigated by makingchanges to the system. A patch to the official Bitcoin client has been devel-oped30 which allows users to prevent the linking of public-keys by makingthe user aware of potential links within the Bitcoin client user-interface. It isalso possible for the client to automatically proxy Bitcoins through dummypublic-keys. This would come at the cost of increased transaction fees butwould increase deniability and obfuscate the chain of transaction histories.Finally, if a future version of the protocol supported protocol-level mixing ofBitcoins, this would increase the difficulty for a passive third-party to trackindividual user histories.
30 http://coderrr.wordpress.com/2011/06/30/patching-the-bitcoin-client-to-make-it-more-anonymous – Retrieved 2011-11-04
26 Reid and Harrigan
1.6 Conclusions
For the past half-century futurists have heralded the advent of a cash-lesssociety [4]. Many of their predictions have been realized, e.g. Anderson etal.’s [4]’s ‘on-line real-time’ payment system and bank-maintained ‘centralinformation files’. However, cash is still a competitive and relatively anony-mous means of payment. Bitcoin is an electronic analog of cash in the onlineworld. It is decentralized: there is no central authority responsible for theissuance of Bitcoins and there is no need to involve a trusted third-partywhen making online transfers. However, this flexibility comes at a price: theentire history of Bitcoin transactions is publicly available. In this chapter weinvestigated the structure of two networks derived from this dataset and theirimplications for user anonymity.
Using an appropriate network representation, it is possible to associatemany public-keys with each other, and with external identifying information.With appropriate tools, the activity of known users can be observed in detail.This can be performed using a passive analysis only. Active analyses, wherean interested party can potentially deploy ‘marked’ Bitcoins and collaboratewith other users can discover even more information. We also believe thatlarge centralized services such as the exchanges and wallet services are capableof identifying and tracking considerable portions of user activity.
Technical members of the Bitcoin community have cautioned that stronganonymity is not a prominent design goal of the Bitcoin system. However,casual users need to be aware of this, especially when sending Bitcoins tousers and organizations they would prefer not to be publicly associated with.
1.7 Acknowledgements
This research was supported by Science Foundation Ireland (SFI) Grant No.08/SRC/I1407: Clique: Graph and Network Analysis Cluster. Both authorscontributed equally to this work. It was performed independently of anyindustrial partnership or collaboration of the Clique Cluster.
References
1. Where’s George? http://www.wheresgeorge.com.
2. Y. Altshuler, N. Aharony, Y. Elovici, A. Pentland, and M. Cebrian. Bitcoin: AnInnovative Alternative Digital Currency. Hastings Science & Technology Law Journal,4:159–208, 2011.
3. Y. Altshuler, N. Aharony, Y. Elovici, A. Pentland, and M. Cebrian. Stealing Real-ity: When Criminals Become Data Scientists (or Vice Versa). Intelligent Systems,
26(6):22–30, 2011.
References 27
4. A. Anderson, D. Cannell, T. Gibbons, G. Grote, J. Henn, J. Kennedy, M. Muir, N. Pot-
ter, and R. Whitby. An Electronic Cash and Credit System. American ManagementAssociation, 1966.
5. A. Back. Hashcash – A Denial of Service Counter-Measure, 2002.
6. L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore Art Thou r3579x?: AnonymizedSocial Networks, Hidden Patterns, and Structural Steganography. In Proceedings of
the 16th International Conference on World Wide Web, pages 181–190. ACM, 2007.
7. A. Barabasi and R. Albert. Emergence of Scaling in Random Networks. Science,286(5439):509–512, 1999.
8. D. Brockmann, L. Hufnagel, and T. Geisel. The Scaling Laws of Human Travel.Nature, 439(26):462–465, 2006.
9. H. Cavusoglu, H. Cavusoglu, and S. Raghunathan. Emerging Issues in Responsible
Vulnerability Disclosure. In Proceedings of the 4th Annual Workshop on Economicsof Information Security (WEIS’05), 2005.
10. A. Clauset, C. Shalizi, and M. Newman. Power-Law Distributions in Empirical Data.
SIAM Review, 51(4):661–703, 2009.11. D. Crandall, L. Backstrom, D. Cosley, S. Suri, D. Huttenlocher, and J. Kleinberg. Infer-
ring Social Ties from Geographic Coincidences. Proceedings of the National Academy
of Sciences, 107(52):22436, 2010.12. W. Dai. B-Money Proposal, 1998.
13. C. Dwork and M. Naor. Pricing via Processing or Combatting Junk Mail. In Proceed-
ings of the 12th Annual International Cryptology Conference on Advances in Cryp-tology (CRYPTO’92), pages 139–147. Springer, 1992.
14. R. Fugger. Money as IOUs in Social Trust NetworksA Proposal for a Decentralized Currency Network Protocol, 2004.
15. R. Gross and A. Acquisti. Information Revelation and Privacy in Online Social Net-
works. In Proceedings of the 2005 ACM Workshop on Privacy in the ElectronicSociety, pages 71–80. ACM, 2005.
16. D. Kaminsky. Black Ops of TCP/IP Presentation. Black Hat, Chaos Communication
Camp, 2011.17. N. Kichiji and M. Nishibe. Network Analyses of the Circulation Flow of Community
Currency. Evolutionary and Institutional Economics Review, 4(2):267–300, 2008.
18. A. Korolova, R. Motwani, S. Nabar, and Y. Xu. Link Privacy in Social Networks.In Proceedings of the 17th ACM Conference on Information and Knowledge Manage-
ment, pages 289–298. ACM, 2008.
19. K. Lewis, J. Kaufman, M. Gonzalez, A. Wimmer, and N. Christakis. Tastes, Ties,and Time: A New Social Network Dataset using Facebook.com. Social Networks,
30:330–342, 2008.20. S. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System, 2008.
21. A. Narayanan and V. Shmatikov. Robust De-anonymization of Large Sparse Datasets.
In Proceedings of the 29th Symposium on Security and Privacy, pages 111–125. IEEE,2008.
22. A. Narayanan and V. Shmatikov. De-anonymizing Social Networks. In Proceedings of
the 30th Symposium on Security and Privacy, pages 173–187. IEEE, 2009.23. M. Nishibe. Chiiki Tuka No Susume (in Japanese). Hokkaido Shokoukai Rengou, 2004.
24. R. Puzis, D. Yagil, Y. Elovici, and D. Braha. Collaborative Attack on Internet Users’Anonymity. Internet Research, 19(1):60–77, 2009.
25. K. Saito. i-WAT: The Internet WAT System – An Architecture for Maintaining Trust
and Facilitating Peer-to-Peer Barter Relationships. PhD thesis, Keio University, 2006.
26. H. Simon. On a Class of Skew Distribution Functions. Biometrika, 42:425–440, 1955.27. F. Stalder. Failures and Successes: Notes on the Development of Electronic Cash. The
Information Society (TIS), 18(3):209–219, 2002.28. The Economist. Digital Curriencies – Bits and Bob. June 2011.
28 Reid and Harrigan
29. V. Vishnumurthy, S. Chandrakumar, and E. Sirer. KARMA: A Secure Economic
Framework for Peer-to-Peer Resource Sharing. In Proceedings of the 1st Workshop onEconomics of Peer-to-Peer Systems.
30. B. Yang and H. Garcia-Molin. PPay: Micropayments for Peer-to-Peer Systems. In
V. Atluri and P. Liu, editors, Proceedings of the 10th ACM Conference on Computerand Communication Security (CCS’03), pages 300–310. ACM Press, 2003.