+ All Categories
Home > Documents > Internet Background Radiation Revisited - Eric Wustrow

Internet Background Radiation Revisited - Eric Wustrow

Date post: 11-Feb-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
13
Internet Background Radiation Revisited Eric Wustrow Networking Research and Development Merit Network Inc. Ann Arbor, MI 48104 USA [email protected] Manish Karir Networking Research and Development Merit Network Inc. Ann Arbor, MI 48104 USA [email protected] Michael Bailey Department of EECS University of Michigan Ann Arbor, MI 48109 USA [email protected] Farnam Jahanian Department of EECS University of Michigan Ann Arbor, MI 48109 USA [email protected] Geoff Huston Asia Pacific Network Information Centre Brisbane QLD 4064 Australia [email protected] ABSTRACT The monitoring of packets destined for routeable, yet un- used, Internet addresses has proved to be a useful technique for measuring a variety of specific Internet phenomenon (e.g., worms, DDoS). In 2004, Pang et al. stepped beyond these targeted uses and provided one of the first generic character- izations of this non-productive traffic, demonstrating both its significant size and diversity. However, the six years that followed this study have seen tremendous changes in both the types of malicious activity on the Internet and the quan- tity and quality of unused address space. In this paper, we revisit the state of Internet ”background radiation”through the lens of two unique data-sets: a five-year collection from a single unused /8 network block, and week-long collections from three recently allocated /8 network blocks. Through the longitudinal study of the long-lived block, comparisons between blocks, and extensive case studies of traffic in these blocks, we characterize the current state of background radi- ation specifically highlighting those features that remain in- variant from previous measurements and those which exhibit significant differences. Of particular interest in this work is the exploration of address space pollution, in which signif- icant non uniform behavior is observed. However, unlike previous observations of differences between unused blocks, we show that increasingly these differences are the result of environmental factors (e.g., misconfiguration, location), rather than algorithmic factors. Where feasible, we offer suggestions for clean up of these polluted blocks and iden- tify those blocks whose allocations should be withheld. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IMC’10, November 1–3, 2010, Melbourne, Australia. Copyright 2010 ACM 978-1-4503-0057-5/10/11 ...$10.00. Categories and Subject Descriptors C.2 [Information Systems Applications]: Network Op- erations General Terms Measurement Keywords Darknet, Internet background traffic, Network pollution 1. INTRODUCTION The monitoring of allocated, globally routeable, but un- used Internet address blocks has been widely used by the security, operations, and research communities to study a wide range of interesting Internet phenomenon. As there are no active hosts in these unused blocks, packets destined to these IP addresses must be the result of worm propa- gation[1, 2, 3], DDoS attacks[4], misconfiguration, or other unsolicited activity. Systems that monitor unused address spaces have a variety of names, including darknets [5], net- work telescopes [6], blackhole monitors [7], network sinks[8], and network motion sensors [9]. While this monitoring technique had seen heavy use in the measurement of specific phenomena, it wasn’t until 2004 when Pang et. al [10] published their seminal paper “Char- acteristics of Internet Background Radiation” that a detailed characterization of this incessant non-productive traffic was available. Through passive measurement and active elicita- tion of connection payloads over several large unused blocks, the authors characterized the behavior of sources and the activities prevalent in Internet background radiation. Most notable in their analysis was the ubiquity of Internet back- ground radiation, its scale, its rich variegation in targeted services, and the extreme dynamism in many aspects of the observed traffic. The six years since this landmark paper have seen sig- nificant changes both in the size, shape, and traffic carried by the Internet as well as the methods and motivations of malicious traffic that makes up Internet background radia- tion. While both scanning as a reconnaissance activity and
Transcript
Page 1: Internet Background Radiation Revisited - Eric Wustrow

Internet Background Radiation Revisited

Eric WustrowNetworking Research and

DevelopmentMerit Network Inc.

Ann Arbor, MI 48104 [email protected]

Manish KarirNetworking Research and

DevelopmentMerit Network Inc.

Ann Arbor, MI 48104 [email protected]

Michael BaileyDepartment of EECSUniversity of Michigan

Ann Arbor, MI 48109 [email protected]

Farnam JahanianDepartment of EECSUniversity of Michigan

Ann Arbor, MI 48109 [email protected]

Geoff HustonAsia Pacific NetworkInformation Centre

Brisbane QLD 4064 [email protected]

ABSTRACTThe monitoring of packets destined for routeable, yet un-used, Internet addresses has proved to be a useful techniquefor measuring a variety of specific Internet phenomenon (e.g.,worms, DDoS). In 2004, Pang et al. stepped beyond thesetargeted uses and provided one of the first generic character-izations of this non-productive traffic, demonstrating bothits significant size and diversity. However, the six years thatfollowed this study have seen tremendous changes in boththe types of malicious activity on the Internet and the quan-tity and quality of unused address space. In this paper, werevisit the state of Internet ”background radiation” throughthe lens of two unique data-sets: a five-year collection froma single unused /8 network block, and week-long collectionsfrom three recently allocated /8 network blocks. Throughthe longitudinal study of the long-lived block, comparisonsbetween blocks, and extensive case studies of traffic in theseblocks, we characterize the current state of background radi-ation specifically highlighting those features that remain in-variant from previous measurements and those which exhibitsignificant differences. Of particular interest in this work isthe exploration of address space pollution, in which signif-icant non uniform behavior is observed. However, unlikeprevious observations of differences between unused blocks,we show that increasingly these differences are the resultof environmental factors (e.g., misconfiguration, location),rather than algorithmic factors. Where feasible, we offersuggestions for clean up of these polluted blocks and iden-tify those blocks whose allocations should be withheld.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.IMC’10, November 1–3, 2010, Melbourne, Australia.Copyright 2010 ACM 978-1-4503-0057-5/10/11 ...$10.00.

Categories and Subject DescriptorsC.2 [Information Systems Applications]: Network Op-erations

General TermsMeasurement

KeywordsDarknet, Internet background traffic, Network pollution

1. INTRODUCTIONThe monitoring of allocated, globally routeable, but un-

used Internet address blocks has been widely used by thesecurity, operations, and research communities to study awide range of interesting Internet phenomenon. As thereare no active hosts in these unused blocks, packets destinedto these IP addresses must be the result of worm propa-gation[1, 2, 3], DDoS attacks[4], misconfiguration, or otherunsolicited activity. Systems that monitor unused addressspaces have a variety of names, including darknets [5], net-work telescopes [6], blackhole monitors [7], network sinks[8],and network motion sensors [9].

While this monitoring technique had seen heavy use inthe measurement of specific phenomena, it wasn’t until 2004when Pang et. al [10] published their seminal paper “Char-acteristics of Internet Background Radiation”that a detailedcharacterization of this incessant non-productive traffic wasavailable. Through passive measurement and active elicita-tion of connection payloads over several large unused blocks,the authors characterized the behavior of sources and theactivities prevalent in Internet background radiation. Mostnotable in their analysis was the ubiquity of Internet back-ground radiation, its scale, its rich variegation in targetedservices, and the extreme dynamism in many aspects of theobserved traffic.

The six years since this landmark paper have seen sig-nificant changes both in the size, shape, and traffic carriedby the Internet as well as the methods and motivations ofmalicious traffic that makes up Internet background radia-tion. While both scanning as a reconnaissance activity and

62

Page 2: Internet Background Radiation Revisited - Eric Wustrow

as a propagation method are both alive and well, the emer-gence and growth of botnets [11, 12] have changed the threatlandscape significantly for most operators. This view of com-promised hosts as a resource worth protecting highlights atension in botnet design between the degree of detection asevidenced by how noisy malicious behaviors are, and the de-sire to maintain the useful resource and avoid detection. Aswith any design tradeoff, there are malicious botnets thatcontinue to be noisy in how they use and acquire hosts (e.g.,Conficker), nevertheless, the last six years has seen a markedchange in how malicious code behaves [13].

Additionally, today’s Internet continues to witness tremen-dous year over year growth, fueled in large part by demandfor video [14]. The role of new content delivery mechanismshave changed how traffic flows and user demands continueto change the applications of interest. These changes im-pact the behaviors observed in background radiation as newservices become more desirable to discover and new networkservices offer new ways to misconfigure themselves.

Our study is primarily motivated by the dramatic shifts inattack behaviors and the Internet as as whole since the origi-nal 2004 Internet background radiation study [10], Addition-ally, as IPv4 address exhaustion nears [15] and dirty net-work blocks can no longer be returned for newer allocations,there is an increasing need to both identify and quantify ad-dress pollution to determine the quality of a network addressblock and to determine the utility of any cleanup effort. Thepurpose of this paper is to revisit Internet background radi-ation in order to determine any evolution in the nature ofthis traffic and to explore any new features that might haveemerged. To provide as broad and detailed a characteri-zation as possible, we draw on two unique sources of datafor our analysis. First, we examine five week-long datasetstaken from the same routed /8 unused address block, rep-resenting the first week in February over the last five years.Second, we examine three week-long datasets built by an-nouncing and capturing traffic to three separate /8 networksrecently allocated to APNIC and ARIN from IANA. Thesethree datasets are compared with each other, as well as withthree matching week-long collections from the /8 used in thelongitudinal study.

To summarize, the value of our work is threefold:

• Revisiting Internet Background Radiation In thispaper we present the first thorough study of Internetbackground radiation since 2004. We study and char-acterize this traffic in an attempt to answer two specificquestions:

– Temporal Analysis of Internet Background Radi-ation The first question is an attempt to under-stand how this traffic has evolved over a 5 yeartime-period.

– Spatial Analysis of Internet Background Radia-tion The second question attempts to answer thequestion of how this traffic might vary based onthe specific darknet address block under observa-tion.

• A study of Internet Address Pollution Our spa-tial analysis of background radiation shows significantdifferences between large blocks of unused address space.We argue these differences are distinct from previouslyreported diversity measurements as they are the result

of significant volumes of non-uniform environmentalfactors—a class of behaviors we collectively label asaddress space pollution.

• Availability of these Traces. We will make all11 datasets, nearly 10 TB of compressed PCAP data,available through the Protected Repository for the De-fense of Infrastructure Against Cyber Threats (PRE-DICT) [16] dataset archive in an effort to further ex-pand our knowledge of these interesting phenomenaand encourage additional exploration.

The rest of this paper is organized as follows; in Section 2we describe some directly related work; Section 3 provides anoverview of our data collection methodology and describesour datasets; Section 4 we revisit Internet background ra-diation, providing both temporal and spatial studies of thistraffic; Section 5 outlines our study of address pollution; Sec-tion 6 summarizes our results and offers some conclusionsand future work.

2. RELATED WORKDirectly related work in this area can generally be cate-

gorized into two related areas. The first area is concernedwith the design, operation and scalability of monitoring In-ternet background radiation, while the second focuses on theanalysis of the data collected via these systems.

There have been several attempts at building Internetbackground radiation monitoring systems; here we describethe three most popular systems. In [6] the authors discussperhaps the most popular and visible monitor at CAIDA.They describe how the size of the monitored address spacecan influence its ability to detect events. They also presentseveral alternative models for building distributed networkmonitors. Data from this monitor has been made availableto the broad network research community which has servedto increase its visibility. The iSink monitor at the Universityof Wisconsin was first published in [8], where the authors de-scribe their experience in building this system as well as us-ing it for both active and passive monitoring for detection ofpossible network abuse activity. One of the chief character-istics of the system was its ability to filter the traffic as wellas incorporate application level responders. The InternetMotion Sensor (IMS) system at the University of Michiganhas been described in [9]. The IMS system was perhaps themost distributed and extensive system of the three we havedescribed here. A main finding of this work was the value ina distributed monitoring system, as different blocks in differ-ent networks reported significantly different behaviors. Thespatial analysis in our study re-confirms these differences,both in space and time, but highlights a growing trend to-ward pollution as the cause of these differences.

A wide variety of work based on improving these tech-niques and systems has followed. A non-exhaustive list ofexamples includes: practical techniques for deploying thesesensors [5], how to build scalable filters in distributed dark-nets [17], where to place distributed sensors [7], how toconfigure services in these sensors [18], the security andanonymity of these sensor blocks [19], and the effectivenessof distributed sensors in various domains such as worm de-tection [20]. One relevant body of work is that of Cookeet. al [21] in which we examined observed non uniformityacross monitors and showed that this non uniformity wasthe result of algorithmic factors in worm propagation and

63

Page 3: Internet Background Radiation Revisited - Eric Wustrow

Data Set Start Date End Date Size (gz’d)A-1: 1/8 2/23/2010 3/1/2010 4134 GB

A-2: 35/8 2/23/2010 3/1/2010 739 GBB-1: 50/8 3/12/2010 3/19/2010 1067 GBB-2: 35/8 3/12/2010 3/19/2010 770 GB

C-1: 107/8 3/25/2010 3/31/2010 1230 GBC-2: 35/8 3/25/2010 3/31/2010 770 GB

Table 1: Datasets used in Darknet Traffic SpatialAnalysis

environmental factors such as misconfiguration. While ourstudy shows that both of these factors continue to play arole, the increase in importance of environmental factors isa striking addition of our study.

The systems described above have led to multiple studiesregarding the nature and characteristics of traffic observedin these darknets. Darknet traffic has been used for spe-cific analysis of malicious activity such as: propagation [22,1, 2, 3], DDoS attacks [4], misconfiguration, or other unso-licited activity [23]. The most relevant work to this paper, ofcourse, is [10] in which the authors present an extremely de-tailed analysis of Internet background radiation as observedin 2004 at four unused IPv4 network blocks. They performedboth active and passive characterization of the backgroundtraffic and concluded that there is significant diversity in thistraffic both in terms of the address blocks monitored as wellas over time. There are three main distinctions betweenthis work and the collection methodology in this paper inaddition to the freshness of the data being examined. First,due to the availability of a large computation and storageinfrastructure, we do not need to filter or sample the trafficbeing analyzed in any way (both were done in the previouswork). In spite of the large increases in volume over time,we find fairly robust processing scripts able to process weeksof data on the order of hours. Second, due to the transientnature and sensitive nature of the blocks being studied (seeSection 3) we do not utilize any active responders to solicittraffic to the block. As a result, we are unable to differen-tiate traffic based on payload, except in the cases of UDP.Finally, we make use of substantially larger amounts of spaceand over longer scales than the previous study.

3. METHODOLOGYIn this section we describe the datasets used in our exper-

iments as well as our long term collection methodology forthe study of Internet address space pollution.

3.1 Data CollectionFor our analysis we used two datasets. The first set of six

distinct sub-datasets we used for studying the spatial prop-erties of darknet traffic, and the second set of five distinctsub-datasets was used for studying the temporal propertiesof darknet traffic.

The six spatial sub-datasets were constructed by obtainingpermission from ARIN and APNIC to announce previouslyunallocated /8 network blocks to the Internet via BGP. Thisresulted in all darknet data destined for these networks to berouted to our data collection infrastructure at Merit. Eachof the 1.0.0.0/8, 50.0.0.0/8 and 107.0.0.0/8 networks wereannounced over a period of one week. The resulting threedatasets were then paired with data from our ongoing data

Data Set Start Date End Date Size (gz’d)D-1: 35/8 2/13/2006 2/19/2006 113 GBD-2: 35/8 2/5/2007 2/11/2007 95 GBD-3: 35/8 2/4/2008 2/10/2008 119 GBD-4: 35/8 2/2/2009 2/8/2009 386 GBD-5: 35/8 2/8/2010 2/14/2010 630 GB

Table 2: Datasets used in Darknet Traffic EvolutionAnalysis

collection on the unused portion of the 35.0.0.0/8 networkblock for the same time period. The 35.0.0.0/8 networkblock is unused except for a /13 block of addresses that isrouted internally at Merit for its customers (96.8% unused).For each dataset we performed a full packet capture using acustomized packet capture utility based on libpcap. Table 1lists these datasets.

Though we actively worked with our upstream providerAT&T to ensure that our BGP route announcements wouldbe propagated into the Internet core, it is possible that lo-cal or regional policies and configuration differences couldhave impacted the global visibility of our darknet route an-nouncements. This in turn would have an impact on whatdata is actually routed to our data collectors. Using pub-licly available BGP routing data from routeviews.org [24]and RIPE [25] we were able to confirm that our BGP routeannouncement for 1.0.0.0/8 was visible to 31 out of 41 ac-tive routeviews peers and 9 out of 16 active RIPE peers.The 50.0.0.0/8 route announcement was visible to 31 outof 41 routeviews peers and 9 out of 14 RIPE peers. The107.0.0.0/8 BGP route announcement was visible to 28 outof 41 routeviews peers and 10 out of 16 RIPE peers. Thoughwe cannot claim to have collected all the network pollutiondirected at these network blocks we believe our data is fairlyrepresentative of the overall trends and data patterns.

The second set of five datasets were used for the tempo-ral analysis section of this paper and were extracted fromour ongoing continuous data collection of packets directedtowards the unused portions of 35/8 network block. We ex-tracted week-long datasets for the first week of February foreach year since 2006. Table 2 lists these datasets. We used 3additional days of data for each of the 5 years to verify thatthe volume of traffic and pollution type distribution was rel-atively stable throughout the year. From this, we find thatour week-long datasets are representative of their respec-tive years. A total of 11 datasets were created representingroughly 10TB of compressed packet captures.

3.2 Internet Pollution and Data ArchivingThis data collection is a part of an ongoing research ac-

tivity in which we are working with IP address registriessuch as ARIN and APNIC in order to collect and archivesamples from newly allocated network blocks for the broadInternet research community. These datasets will then bepublished via PREDICT [16] dataset archive. Any researchactivity which interacts with critical Internet infrastructuremust carefully balance the need for informing relevant par-ties as well as ensuring that such a process does not resultin a dirty dataset. For each new allocation we obtain a clearLetter of Authorization (LOA) from the RIR whose net-work block we wish to monitor. This LOA outlines the re-search activity and the duration for which we are authorized

64

Page 4: Internet Background Radiation Revisited - Eric Wustrow

ARIN

APNIC

Merit/University of Michigan

LOA

RIRsATT

LOA

Dataset Archive

Data Collector

BGPDarknet

Data

Internet

Figure 1: Cooperative Internet background radia-tion data collection

to announce this network block. This is then presented toour primary upstream provider, AT&T, which then removesany filters that would prevent our BGP announcement frompropagating to the Internet. We also take care to publishinformation regarding our proposed announcement in theRADB [26] in the case the network operator community hassome concerns regarding our BGP announcements. We donot actively announce our experiments on the network op-erator mailing lists as it might result in tampering with ourdata collection; though we are prompt to answer any spe-cific queries that might arise as a result of our experiment.Figure 1 summarizes this process.

4. REVISITING INTERNETBACKGROUND RADIATION

At a high level, Internet background radiation can be clas-sified into three distinct types based on different root causesof these activities. Scanning is largely the result of infectedhosts on the Internet attempting to find other vulnerabletargets, backscatter is most often the result of Denial of Ser-vice attacks, and finally misconfiguration, which is a resultof software or hardware errors. Table 4 shows the contribu-tion of the these three main types of background radiationto each of 1/8, 50/8, 107/8 and 35/8. We classify TCP SYNpackets as scanning traffic. We define backscatter traffic asTCP SYN+ACK, RST, RST+ACK, and ACK packets, asthese packets are likely to be generated by hosts attemptingto respond to communication from a forged source in thedarknet. Finally, we classify the remaining traffic as mis-configuration. When comparing /8s to the baseline 35/8captured during the same time interval, we observe thatbackscatter traffic volumes (in billions of packets per week)are nearly identical. Likewise, scanning is of similar magni-tudes, despite slightly increased volumes in 50/8 and 107/8.Misconfiguration traffic contains the most variance between/8s, due to its directed nature. In the following subsections,

Protocol 2006 2007 2008 2009 2010TCP(%pkts) 76.5 85.7 45.8 87.8 87.2UDP(%pkts) 19.1 6.8 49.9 11.4 12.3

ICMP(%pkts) 4.2 5.0 3.8 0.6 0.4other(%pkts) 0.2 2.5 0.5 0.2 0.1TCP(%bytes) 22.5 75.6 16.3 82.5 82.2UDP(%bytes) 75.3 13.4 81.6 16.6 17.2

ICMP(%bytes) 2.1 8.3 1.8 0.7 0.4other(%bytes) 0.1 2.7 0.3 0.2 0.2

Table 3: Traffic Distribution by protocol over timein terms of total packets as well as bytes, for 35/82006-2010 (D1-D5).

Dataset Scanning Backscatter Misconfiguration1/8 (A1) 12.5 B 1.7 B 55.9 B

35/8 (A2) 15.5 B 1.6 B 5.2 B50/8 (B1) 17.7 B 2.4 B 10.2 B35/8 (B2) 15.2 B 2.5 B 5.6 B107/8(C1) 18.9 B 2.2 B 14.8 B35/8 (C2) 14.8 B 2.2 B 6.0 B

2006 (D1) 1.7 B 1.0 B 0.8 B2007 (D2) 1.8 B 0.8 B 0.5 B2008 (D3) 1.1 B 0.4 B 1.8 B2009 (D4) 9.5 B 1.4 B 1.5 B2010 (D5) 15.5 B 1.6 B 5.2 B

Table 4: Billions of packets received per week foreach pollution type in Upper: 1/8 (A1), 50/8 (B1),107/8 (C1) and 35/8 (A2,B2,C2); Lower: 35/8 2006-2010 (D1-D5).

we investigate background radiation in the context of itstemporal and spacial properties.

4.1 Temporal Analysis of InternetBackground Radiation

Figure 2 shows the overall traffic rate observed at the 35/8darknet during the first week of February for each year start-ing from 2006. There is an almost 4 fold increase in theobserved traffic volume to the address space over this 5 yearobservation window. While the observed traffic rate in 2006is less than 5Mbps it does have a significant number of ex-tremely large spikes which can reach as high as 60Mbps.These spikes are largely the result of traffic on UDP port1026, which represents Windows Messenger popup spamcampaigns and are consistent with similar increases in activ-ity seen in the second half of 2005 and the first half of 2006(e.g., http://www.dshield.org/). 2007 demonstrates onlya modest increase over 2006, but this traffic rate increasessteadily to almost 20 Mbps by Feb 2010. This translatesinto roughly 100% growth over each of the last four years.It is interesting to note that this rate of growth is nearlytwice that of productive Internet traffic which is currentlyexhibiting 50% year over year growth rates [14].

Table 3 shows the relative composition of the darknet traf-fic over time in terms of packets and bytes. The percentageof UDP traffic increases dramatically in 2008 in terms ofboth packets and bytes. It is particularly interesting to notethat there appears to be a significant outbreak of SQL Slam-mer worm scanning in 2008, initially evidenced by the spikein the volume of traffic observed on UDP port 1434. Re-call that the SQL Slammer worm spread over the course of

65

Page 5: Internet Background Radiation Revisited - Eric Wustrow

1

2

4

8

16

32

64

128

256

0 1 2 3 4 5 6 7

Mbi

t/s

Time (day)

Volume of Traffic to 35.0.0.0/8

20102009200820072006

Figure 2: Temporal analysis of Internet Background Radiation. Overall measured traffic is shown from2006-2010 using datasets D-1, D-2, D-3, D-4, D-5.

TCP Port 2006 2007 2008 2009 2010445 23.1 8.8 7.2 70.8 83.1139 12.9 4.2 3.5 0.9 0.6

4662 - 17.1 8.3 - -80 2.6 - - 0.6 0.2

135 6.9 3.4 12.9 1.3 -

Table 5: Most popular TCP destination ports overtime in terms of percentage of total TCP packets,2006-2010 (D1-D5).

10 minutes in 2003, infecting thousands of hosts. We wereable to manually verify that this spike was indeed Slammerexploits and not some other exploit by verifying the pay-load and comparing it with the well known SQL Slammerpayload. Whether the re-emergence of the worm or otherscanning efforts, its occurrence at scale five years after theinitial outbreak is puzzling. It should be noted that whencompared to the results reported in [10] the percentages ofTCP traffic in terms of packets appears very similar.

Table 5 shows the most popular TCP ports in terms ofthe total percentage of TCP traffic. When compared to theresults reported in [10] in 2004 by the time of dataset D1in 2006 we notice only a minor up-tick in port 445 activ-ity and a general decrease in the fraction of the reportedpopular ports (i.e., 80, 135, 139) in the study. We do, how-ever, witness the same dynamism as reported in that study,with ports such as 4662 shown in Table 5 appearing anddisappearing in popularity. While most of these shifts areshort lived and seemingly without explanation, several ma-jor events stand out. It is particularly interesting to notethe dramatic increase in traffic on port 445 in 2009-2010.This is consistent with the emergence of the Conficker bot-net in October 2008. Another interesting artifact visible inthe data is the emergence of ssh scanning as a significantpercentage of background radiation traffic starting in 2007.Scans on TCP port 23 also begin to emerge starting 2007

TCP Flags 2006 2007 2008 2009 2010syn 62.7 66.7 74.2 87.5 93.9

syn+ack 26.1 28.9 21.2 8.6 5.2rst+ack 8.5 3.3 3.0 2.9 0.3

rst 2.3 0.8 1.4 0.4 0.3ack 0.1 - - 0.3 0.1

Table 6: Most popular TCP flags over time in termsof percentage of total TCP packets, 2006-2010 (D1-D5).

which indicates a significant up-tick in attempts to locatebackdoors installed by various worms.

Table 6 shows the most commonly used flags in TCP pack-ets over time as a percentage of total number of packets. Avery clear trend is visible from this data which is the steadyincrease in packets which have only the SYN flag set. From2006 to 2010 the total percentage of TCP packets with SYNflag increases from 63% to almost 94% at the same timethe percentage of packets with SYN-ACK flags set decreasesdramatically from 26.1% in 2006 to 5.2% in 2010. It is un-clear if this indicates an increase in scanning activity and adecline in DDoS activity.

The emergence of Conficker also accounts for perhaps themost significant shift in the nature of Internet backgroundradiation. Figure 3 shows the Cumulative Distribution Func-tion (CDF) of all destinations for which traffic was receivedin the 35/8 darknet. The 2006 CDF is virtually a straightline, indicating no significant hot-spot activity in this traffic.However, starting in 2008 a knee starts to form in the CDFwhich indicates the emergence of hot-spot activity. Finally,in 2009 and 2010 we are able to observe a very pronouncedkink in the CDF.

This is congruent with a bug in Conficker’s pseudo-randompropagation algorithm [27]. This bug causes it to fix bits 8and 24 (most-significant bits of octets 2 and 4, respectively)as 0, resulting in Conficker propagation scans being limitedto only 1/4 of the Internet address space. In all observed /8s

66

Page 6: Internet Background Radiation Revisited - Eric Wustrow

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 20000 40000 60000

CDF

/24

CDF of Destination /24s for 35.0.0.0/8

20102009200820072006

(a) The distribution of destination /24s targeted in the un-used address block.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 1e+06 2e+06

CDF

/24

CDF of Source /24s for 35.0.0.0/8

20102009200820072006

(b) The distribution of source /24s targeted in the unusedaddress block.

Figure 3: Changes in source and destination behavior from 2006-2010 using datasets D-1, D-2, D-3, D-4, D-5.

TCPPort A1 - A2 B1 - B2 C1 - C2

21 40.3 — —25 1.7 — —80 8.7 — —

443 1.6 — —445 -75.0 -2.0 -7.5143 32.5 — —

1024 — — -1.15022 1.6 — —6112 2.2 — —

Table 7: The most significant changes in the contri-bution of a TCP destination port when comparedbetween blocks. Only those ports whose contribu-tion to total traffic at a block that were different bymore than %1 are shown.

after 2008, we observe roughly 3 times less traffic for desti-nation IPs with a second or fourth octet of 128 or greater.

4.2 Spatial Analysis of Internet BackgroundRadiation

Figures 4, 6, 8, 9, and 7 represent our analysis of datasetsA-[1,2], B-[1,2], and C-[1,2]. The stacked graphs representdata collected from the 35/8 darknet (A2, B2, and C2) onthe top row, while the bottom row of graphs represents datacollected from 1/8(A1), 50/8(B1), and 107/8(C1).

The overall traffic volume in bytes and packets is shownin Figure 4. One of the most dramatic features is the enor-mous volume of traffic in 1/8. The 1/8 network sees InternetBackground Radiation rates as high as 150Mbps. As we dis-cuss in the following section, most of this traffic is directedtoward a small number of destinations in 1/8, due to mis-configuration in a wide range of Internet devices. Both 50/8and 107/8 traffic rates show a significant diurnal patternwith almost similar data rates. The overall darknet trafficvolume ranges from 20-40Mbps or 40-60Kps. One puzzlingfeature visible in these figures is the clipped nature of the35/8 graphs. We believe this is caused by a rate limit ona device that is present in the path of our data collector.While we have been able to verify that such a limit is notpresent in oCDFown collection network, we have so far not

been able to verify that there is no such setting at our up-stream provider. The traffic volume by protocol is shownin Figure 5. The traffic volume in Figures 5 and 4 shows asharp dip on day 7 of the A-1 dataset. This was caused bya temporary duplicate BGP announcement by APNIC.

The first column of Figures 6 and 7 show the cumulativedistribution function (CDF) of the cumulative contributionsof traffic with destination and source in each /24 network.The 1/8 graphs show extremely high hotspot activity in boththese figures as evidenced by the extremely sharp knee inthese graphs. The second and third columns correspondwith datasets B-1 and C-1 respectively. Both of these displaymoderate hotspot activity in the destination CDF but thesource CDF graphs are virtually identical across A-2, B-1,B2, C-1, C-2. We describe some of this hotspot activity indetail in the next section.

Datasets A-2, B-1, B-2, C-1, and C-2 all display remark-able similarity in the TCP destination port distributions.Table 7 summarizes the differences between these datasets.It shows ports whose contribution was different by more than1% when compared to the A2, B2, and C2. The most in-teresting features that we discovered during our analysis ofthe UDP destination ports was some unusual activity onport 514 in dataset B-1, which is the port associated withsyslog, as well as activity on port 15206, which representsSIP traffic in dataset A-1. Figure 8 shows the traffic volumecontributed by these features and we discuss some of themin the following section.

The source Operating System estimate obtained by ob-serving the TTL values in the TCP packets shows that therelative volume of traffic generated from the various sourcesappears to be the same for 35/8, 50/8, and 107/8 (datasetsA-2, B-1, B-2, C-1, C-2). Recall that the default TTL val-ues for Windows, Linux and Solaris are 128, 64 and 255respectively. Windows hosts tend to dominate the total traf-fic volume by various sources in all except the 1/8 darknetblock, where Linux sources are responsible for a majority ofthe traffic. Analysis of the UDP TTL values displays similardistribution for all darknets except once again the 1/8 wherewe see Windows, Linux, Solaris and perhaps some embeddeddevices as possible contributors to the pollution. We were,for example, able to identify some pollution at this network

67

Page 7: Internet Background Radiation Revisited - Eric Wustrow

Volume of Traffic

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6 7

Mbp

s

(A2) 35.0.0.0/8

KPkt/sMbit/s

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6 7

Mbp

s

(A1) 1.0.0.0/8

KPkt/sMbit/s

0 1 2 3 4 5 6 7

(B2) 35.0.0.0/8

KPkt/sMbit/s

0 1 2 3 4 5 6 7020406080100120140

(C2) 35.0.0.0/8

KPkt/sMbit/s

0 1 2 3 4 5 6 7

(B1) 50.0.0.0/8

KPkt/sMbit/s

0 1 2 3 4 5 6 7020406080100120140

(C1) 107.0.0.0/8

KPkt/sMbit/s

Time (days)

Figure 4: Spatial analysis of Internet Background Radiation. Overall measured traffic (bytes and packets) isshown for datasets A-1, A-2, B-1, B-2, C-1, C-2

Traffic Volume Breakdown by Protocol

0

20

40

60

80

100

120

140

160

0 1 2 3 4 5 6 7

Thou

sand

Pac

kets

/s

(A2) 35.0.0.0/8

TCPUDP

ICMP

0

20

40

60

80

100

120

140

160

0 1 2 3 4 5 6 7

Thou

sand

Pac

kets

/s

(A1) 1.0.0.0/8

TCPUDP

ICMP

0 1 2 3 4 5 6 7

(B2) 35.0.0.0/8

TCPUDP

ICMP

0 1 2 3 4 5 6 7

(C2) 35.0.0.0/8

TCPUDP

ICMP

0 1 2 3 4 5 6 7

(B1) 50.0.0.0/8

TCPUDP

ICMP

0 1 2 3 4 5 6 7

(C1) 107.0.0.0/8

TCPUDP

ICMP

Time (days)

Figure 5: Spatial analysis of Internet Background Radiation. Overall measured traffic by protocols is shownfor datasets A-1, A-2, B-1, B-2, C-1, C-2

68

Page 8: Internet Background Radiation Revisited - Eric Wustrow

Destination IP Address CDF

00.10.20.30.40.50.60.70.80.9

1

0 20000 40000 60000

CDF

(A2) 35.0.0.0/8

packetsbytes

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20000 40000 60000

CDF

(A1) 1.0.0.0/8

packetsbytes

0 20000 40000 60000

(B2) 35.0.0.0/8

packetsbytes

0 20000 40000 60000

(C2) 35.0.0.0/8

packetsbytes

0 20000 40000 60000

(B1) 50.0.0.0/8

packetsbytes

0 20000 40000 60000

(C1) 107.0.0.0/8

packetsbytes

Destination /24s

Figure 6: Spatial analysis of Internet Background Radiation. The CDF representing the cumulative contri-bution of individual /24 destination using datasets A-[1,2], B-[1,2], C-[1,2].

Source IP Address CDF (/24 networks)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1e+06 2e+06

CDF

(A2) 35.0.0.0/8

packetsbytes

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1e+06 2e+06

CDF

(A1) 1.0.0.0/8

packetsbytes

1e+06 2e+06

(B2) 35.0.0.0/8

packetsbytes

1e+06 2e+06

(C2) 35.0.0.0/8

packetsbytes

1e+06 2e+06

(B1) 50.0.0.0/8

packetsbytes

1e+06 2e+06

(C1) 107.0.0.0/8

packetsbytes

Source /24s

Figure 7: Spatial analysis of Internet Background Radiation. The cumulative distribution function (CDF)representing cumulative contribution of individual /24 source network blocks for both total packets and bytesare shown using datasets A-[1,2], B-[1,2], C-[1,2]. Sorted with highest contributors on the left.

69

Page 9: Internet Background Radiation Revisited - Eric Wustrow

Top 20 UDP Destination Ports

1

10

100

1000

10000

53 13714341863424642545400800282291098811623123231736822112242272593635384384035563965287

Milli

on P

acke

ts

(A2) 35.0.0.0/8

1

10

100

1000

10000

0 53 80 13716151410013072500150606060131781520615207333683349333527436906144065535

Milli

on P

acke

ts

(A1) 1.0.0.0/8

43 53 1231375004246425446055060540012323173682211224227353844178549153501496528065535

(B2) 35.0.0.0/8

53 137143442464254460554007973123231736819987221122422730058353844825149153621076528065535

(C2) 35.0.0.0/8

53 13716151445044605582282388906934812323129021464617368181302422725701274864805360898

(B1) 50.0.0.0/85 53 58 1378631024128424273700460513856173682211222962242272445426156274872809265535

(C1) 107.0.0.0/8

UDP destination port

Figure 8: Spatial analysis of Internet Background Radiation. The top 20 UDP destination ports are shownusing datasets A-[1,2], B-[1,2], C-[1,2].

Time-to-live values for UDP packets

0.1

1

10

100

0 50 100 150 200 250

Milli

on P

acke

ts

(A2) 35.0.0.0/8

0.1

1

10

100

0 50 100 150 200 250

Milli

on P

acke

ts

(A1) 1.0.0.0/8

0 50 100 150 200 250

(B2) 35.0.0.0/8

0 50 100 150 200 250

(C2) 35.0.0.0/8

0 50 100 150 200 250

(B1) 50.0.0.0/8

0 50 100 150 200 250

(C1) 107.0.0.0/8

TTL value

Figure 9: Spatial analysis of Internet Background Radiation. The distribution of TTL values for UDP trafficis shown using datasets A-[1,2], B-[1,2], C-[1,2].

70

Page 10: Internet Background Radiation Revisited - Eric Wustrow
Page 11: Internet Background Radiation Revisited - Eric Wustrow
Page 12: Internet Background Radiation Revisited - Eric Wustrow
Page 13: Internet Background Radiation Revisited - Eric Wustrow

Recommended