IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …xinwenfu/paper/Journals/09_TPDS_20_11... · An...

An Invisible Localization Attack toInternet Threat Monitors

Wei Yu, Member, IEEE Computer Society, Xun Wang, Xinwen Fu, Member, IEEE Computer Society,

Dong Xuan, Member, IEEE Computer Society, and Wei Zhao, Fellow, IEEE

Abstract—Internet threat monitoring (ITM) systems have been deployed to detect widespread attacks on the Internet in recent years.However, the effectiveness of ITM systems critically depends on the confidentiality of the location of their monitors. If adversaries learnthe monitor locations of an ITM system, they can bypass the monitors and focus on the uncovered IP address space without beingdetected. In this paper, we study a new class of attacks, the invisible LOCalization (iLOC) attack. The iLOC attack can accurately andinvisibly localize monitors of ITM systems. In the iLOC attack, the attacker launches low-rate port-scan traffic, encoded with a selectedpseudonoise code (PN-code), to targeted networks. While the secret PN-code is invisible to others, the attacker can accuratelydetermine the existence of monitors in the targeted networks based on whether the PN-code is embedded in the report data queriedfrom the data center of the ITM system. We formally analyze the impact of various parameters on attack effectiveness. We implementthe iLOC attack and conduct the performance evaluation on a real-world ITM system to demonstrate the possibility of such attacks. Wealso conduct extensive simulations on the iLOC attack using real-world traces. Our data show that the iLOC attack can accuratelyidentify monitors while being invisible to ITM systems. Finally, we present a set of guidelines to counteract the iLOC attack.

Index Terms—Internet threat monitoring systems, invisible localization attack, PN-code, security.

Ç

1 INTRODUCTION

IN recent years, widespread attacks such as worms [1], [2],[3] and distributed denial-of-service (DDoS) attacks [4], [5]

have been dangerous threats to the Internet. Due to thewidespread nature of these attacks, large-scale trafficmonitoring across the Internet has become necessary in orderto effectively detect and defend against them. Developingand deploying Internet threat monitoring (ITM) systems (ormotion sensor networks) is a major effort in this direction.

An ITM system consists of a number of monitors and adata center. The monitors are distributed across the Internetand can be deployed at hosts, routers, firewalls, etc. Eachmonitor is responsible for monitoring and collecting trafficaddressed to a range of IP addresses within a subnetwork.The range of IP addresses covered by a monitor is alsoreferred to as the location of the monitor. Periodically, themonitors send traffic logs to the data center. The data centeranalyzes the traffic logs and publishes reports to the public.Recall that in order to maximize the usage of such reports,most existing ITM systems publish the reports online andmake them accessible to the public. The reports providecritical insights into widespread Internet attacks and areused in detecting and defending against such attacks. ITM

systems have been successfully used to detect the outbreaksof worms [6] and DDoS attacks [7]. There have been manyreal-world developments and deployments of ITM systems.Examples include Distributed Overlay for MonitoringInterNet Outbreaks (DOMINO) [8], SANs Internet StormCenter (ISC) [6], Internet Sink [9], Network Telescope [10],CAIDA [11], MyNetWatchMan [12], and Honeynet [13], [14].

However, the usability of ITM systems largely depends onthe confidentiality of IP addresses covered by their monitors,i.e., the locations of monitors. If the locations of monitors areidentified, the attacker can deliberately avoid these monitorsand directly attack the uncovered IP address space. It is aknown fact that the number of subnetworks covered bymonitors is much smaller than the total number of subnet-works in the Internet [6], [9], [10]. In other words, the IPaddress space covered by monitors represents a very smallportion of the entire IP address space [6]. Hence, bypassing IPaddress spaces covered by monitors will significantly degradethe accuracy of the traffic data collected by the ITM system inreflecting the real situation of attack traffic. Furthermore, theattacker may also poison ITM systems by manipulating thetraffic toward and captured by disclosed monitors. Forexample, the attacker may launch high-rate port-scan trafficto disclosed monitors and feign a large-scale worm propaga-tion. The attackers may even launch retaliation attacks (e.g.,DDoS) against participants (i.e., monitor contributors) of ITMsystems, thereby discouraging them from contributing to ITMsystems. In summary, the attacker can significantly compro-mise the ITM system usability if locations of monitors aredisclosed. It is important to have a thorough understanding ofsuch attacks in order to effectively protect ITM systems.

In this paper, we conduct a systematic investigation of aclass of attacks that aim to localize monitors accurately andinvisibly. Accuracy is very important for an attacker inidentifying monitor locations. Meanwhile, invisibility is alsovital to a successful attack. If the attack attempts are identifiedby the defender (such as the ITM administrators), counter-measures can be applied by the defender to reduce oreliminate the effects of the attack by filtering suspicious traffic

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2009 1611

. W. Yu is with the Department of Computer and Information Sciences,Towson University, Towson, MD 21252. E-mail: [email protected].

. X. Wang is with Cisco Systems, Inc., San Jose, CA 95134.E-mail: [email protected].

. X. Fu is with the Department of Computer Science, University ofMassachusetts Lowell, Lowell, MA 01854. E-mail: [email protected].

. D. Xuan is with the Department of Computer Science and Engineering,Ohio State University, 2015 Neil Avenue, Columbus, OH 43210.E-mail: [email protected].

. W. Zhao is with the University of Macau, Av. Padre Toms Pereira TaipaMacau, P.R. China. E-mail: [email protected].

Manuscript received 6 Apr. 2008; revised 18 Aug. 2008; accepted 20 Nov.2008; published online 10 Dec. 2008.Recommended for acceptance by C.-Z. Xu.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPDS-2008-04-0131.Digital Object Identifier no. 10.1109/TPDS.2008.255.

1045-9219/09/$25.00 � 2009 IEEE Published by the IEEE Computer Society

(so that the attacker will not be able to identify monitorsthrough traffic analysis [15]), confusing attackers (to make theattacker obtain wrong monitor location information [13]), andeven tracking an attacker to its origin (so that attackers can beheld accountable for their malicious acts [16], [17]).

Several attack schemes to discover the location of monitorshave been investigated [18], [19]. However, our work is thefirst to address an attack aiming to achieve the objectives ofboth accuracy and invisibility. It is challenging for theattacker to achieve these two objectives simultaneously.Intuitively, the attacker can use the high-rate attack traffic, asin [18] and [19], to achieve high attack accuracy as follows:The attacker can launch high-rate port-scan traffic to a targetnetwork. The attacker then queries the data center for thereport on recent port-scan activities. If there is a traffic spike inthe report data reflecting the high-rate port-scan traffic sentby the attack, the attacker can determine that the targetnetwork is deployed with a monitor(s) that sends trafficreport to the data center. However, the drawback of thisscheme is its high visibility, since the launched high-ratetraffic makes it highly visible to the defender.

In this paper, we investigate a new class of attacks calledinvisible LOCalization (iLOC) attack. In the iLOC attack, theattacker launches low-rate port-scan traffic (also referred to asattack traffic) to target networks. The scan traffic is encodedwith a carefully selected pseudonoise code (PN-code), known byonly the attacker. The PN-code embedded in traffic can beaccurately recognized by the attacker even with the inter-ference from background traffic aggregated by the data centerbut not generated by iLOC. Thus, the attacker is able toaccurately determine the existence of monitors in the targetnetworks based on whether the same PN-code is embedded inthe report dataqueriedfrom the datacenterof the ITMsystem.The PN-code modulated/embedded scan traffic will appearas innocent noise in both the time and frequency domains,rendering it invisible to others who do not know the PN-code.Only those aware of the original PN-code can correctlyrecover the encoded PN-code and identify the monitorlocations. Therefore, using the iLOC technique, the attackercan accurately localize monitors while evading detection.

We conduct both theoretical analysis and experimentalevaluation on the iLOC attack. We derive formulas for boththe accuracy and invisibility of the attack. We analyze anddiscuss the impacts of various attack parameters (e.g.,PN-code length, attack traffic rate, etc.) on the effectivenessof attack. Based on the analytical results, we discuss how theattacker can select the attack parameters in order to achieveboth attack accuracy and invisibility. We implement theiLOC attack and perform the performance evaluation on areal-world ITM system, which demonstrates the possibilityof the iLOC attack. We also conduct extensive performanceevaluations on the iLOC attack in a simulated environment.Our evaluations are based on replaying a large set of real-world Internet traffic traces collected by a real-world ITMsystem. The evaluation data demonstrate that the attack canaccurately identify the locations of monitors, while evadingdetection by those who do not the know the PN-code usedby the attacker. Furthermore, we present a set of guidelineson how to counteract the iLOC attack.

The remainder of the paper is organized as follows: InSection 2, we describe the iLOC attack in detail. In Section 3,a formal analysis of attack accuracy and invisibility and theimpacts of various parameters on the performance of theiLOC attack are presented. In Section 4, we introduce ourimplementation of the iLOC attack and the validation in the

real-world experiments. In Section 5, we report ourperformance evaluation results on the iLOC attack. InSection 6, we discuss some preliminary countermeasuresagainst the iLOC attack. In Section 7, we review the relatedwork. Finally, we conclude the paper in Section 8.

2 iLOC ATTACK

In this section, we will present the iLOC attack in detail. Wewill first give an overview of the iLOC attack and thenintroduce the detailed procedures of the attack, followed bydiscussions. Table 1 summarizes the notations used in thispaper.

2.1 Overview

Fig. 1 shows the basic workflow of the iLOC attack and thebasic idea of the ITM system. In the ITM system, monitorsdeployed at various networks record their observed port-scantraffic and continuously update their traffic logs to the datacenter. The data center first summarizes the volume of port-scan traffic toward (and reported by) all monitors and thenpublishes the report data to the public in a timely fashion. Inthis paper, background traffic refers to aggregate trafficcollected by the data center but not generated by iLOC attacks.

As shown in Figs. 1a and 1b, respectively, the iLOCattack consists of the following two stages: 1) Attack trafficgeneration: In this stage, as shown in Fig. 1a, the attacker firstselects a code and encodes the attack traffic by embedding aselected code. The attacker then launches the attack traffictoward a target network (e.g., network A in Fig. 1a). Wedenote such an embedded code pattern in the attack traffic asthe attack mark of the iLOC attack and denote the attacktraffic encoded by the code as attack mark traffic. 2) Attacktraffic decoding: In this stage, as shown in Fig. 1b, the attackerfirst queries the data center for the traffic report data. Suchreport data consist of both attack traffic and backgroundtraffic. Given the report data, the attacker tries to recognize

1612 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 20, NO. 11, NOVEMBER 2009

TABLE 1Notations

the attack mark (i.e., the code embedded in the iLOC attacktraffic) by decoding the report data. If the attack mark isrecognized, the report data must include the attack traffic,which means the target network is deployed with monitorsand the monitors are sending traffic reports to the datacenter of ITM systems.

Code-based attack. The iLOC attack adopts a code-basedapproach to generate the attack traffic. Coding techniqueshave been widely implemented in secured communications;for example, Morse code is one such example. Withoutknowledge of Morse code, it is impossible for the receiver tointerpret the carried information [20]. In the iLOC attack, thePN-code-based approach we adopt has three advantages.First, the code is embedded in traffic and can be correctlyrecognized by the attacker even with the interference frombackground traffic. This favors the attack accuracy. Second,the code of sufficient length provides enough privacy. That is,the code is only known by the attacker, and thereby, the codepattern embedded in attack traffic can only be recognized bythe attacker. Last, the code is able to carry information. Alonger code is more immune to interference and requirescomparatively lower rate attack traffic as the carrier, which isharder to detect. All these characteristics contribute help toachieve the objectives of attack accuracy and invisibility.

Parallel attack capacity. Intuitively, one simple way toachieve this parallel attack is to launch port-scan/attacktraffic toward multiple target networks simultaneously byscanning a different port for each target network. Forexample, if the data center publishes traffic reports of1,000 (TCP/UDP) ports, then the attacker can identify allthese 1,000 networks simultaneously, attacking each net-work with a different port. Since attack traffic on differentports are summarized separately at the data center, theattacker can still separate and, thus, decode the traffictoward different targets. The attacker therefore can localize

monitors in multiple networks simultaneously and accu-rately; however, can the attacker further improve the attackefficiency? Assuming that the data center only publishesreports of 1,000 ports, can the attacker fingerprint10,000 target networks simultaneously, for example, byattacking 10 different networks using one port? High-rateport-scan traffic cannot achieve this as it is indiscerniblewhether a spike in the traffic report is caused by traffic logsfrom one network or the other nine networks. In order toachieve this goal in the code-based attack, the selected codeand corresponding encoded attack traffic toward multiplenetworks for the same port should not interfere with eachother (i.e., each of them can be decoded individually andaccurately by the attacker, although they are integrated/summarized in the traffic report from the ITM data center).The PN-code used by the iLOC attack can target multiplenetworks by launching probing traffic on the same portsimultaneously. This unique feature can improve theattack efficiency significantly. The details of how to selectthe PN-code will be discussed in the following sections.

2.2 Attack Traffic Generation Stage

In this attack stage, the attacker 1) selects the code, which isa PN-code in our case, 2) encodes the attack traffic using theselected PN-code, and 3) sends the encoded attack traffictoward the target network. In the third step, the attacker cancoordinate a large number of compromised bots to generatethe attack traffic [21]; however, this is not the focus of thispaper. In the remaining sections, we will focus on the firstand second steps.

2.2.1 Code Selection

To evade detection, the attack traffic should be similar to thebackground traffic. From a large set of real-world traffictraces obtained from SANs ISC [6], [22], we conclude that thebackground traffic shows random patterns in both the timeand frequency domains. The attack objectives of bothaccuracy and invisibility and an attacker’s desire for parallelattacks require that 1) the encoded attack traffic should blendin with background traffic, i.e., be random in both the timeand frequency domains, 2) the code embedded in the attacktraffic should be easily recognizable to the attacker alone, and3) the code should support parallel attacks on the same port.

To meet the above requirements, we choose the PN-codeto encode the attack traffic. The PN-code in the iLOC attackis a sequence of �1 or þ1 with the following features [23],[24], [25]: 1) The PN-code is random and “balanced.” The�1 and þ1 are randomly distributed, and the occurrencefrequencies of �1 and þ1 are nearly equal. This featurecontributes to good spectral density properties (i.e., equallyspreading the energy over all frequency bands). It makesthe attack traffic appear as noise and blend in withbackground traffic in both the time and frequency domains.2) The PN-code has a high correlation to itself and a lowcorrelation to others (such as random noise), where thecorrelation is a mathematical utility for finding repeatingpatterns in a signal [25]. This makes it feasible for theattacker to accurately recognize attack traffic (encoded bythe PN-code) from the traffic report data, even under theinterference of background traffic. 3) The PN-code has a lowcross-correlation value among different PN-code instances.The lower this cross-correlation value, the less interferenceamong multiple attack sessions in parallel attack. Thismakes it feasible for the attacker to conduct parallel attackstoward multiple target networks on the same port.

YU ET AL.: AN INVISIBLE LOCALIZATION ATTACK TO INTERNET THREAT MONITORS 1613

Fig. 1. Workflow of the iLOC attack. (a) Attack stage 1: attack traffic

generation. (b) Attack stage 2: attack traffic decoding.

There are mature PN-code generators such as the m-sequence code, Barker code, gold codes, and Hadamard-Walsh codes [23], [24]. In this paper, we use the m-sequencecode, which has the best autocorrelation (it only highlycorrelates to itself with asharp autocorrelation peak) [23], [26].We use the feedback shift register to repeatedly generate the m-sequence PN-code due to its popularity and ease ofimplementation [26]. In particular, a feedback shift registerconsists of two parts. One is an ordinary shift registerconsisting of a number of flip-flops (two-state memorydevices). The other is a feedback module to form a multiloopfeedback logic.

2.2.2 Attack Traffic Encoding

During the attack traffic encoding process, each bit of theselected PN-code is mapped to a unit time period Ts,denoted as mark-bit duration. The entire duration oflaunched attack traffic (referred to as traffic launch session)is TsL, where L is the length of the PN-code. After theattacker launches port scans to target networks, he/she alsoqueries the data center for the traffic report periodically. Forbrevity, this query interval is set to Ts. The detaileddiscussion of determining these parameters will be pre-sented in Section 3.

The encoding is conducted based on the following rules:each bit of the PN-code maps to a mark-bit duration Ts;when the PN-code bit is þ1, port-scan traffic with a highrate, denoted as mark traffic rate V , is generated in thecorresponding mark-bit duration; when the code bit is �1,no port-scan traffic is generated in the correspondingmark-bit duration. Thus, the attacker embeds the attacktraffic with a special pattern, i.e., the original PN-code.Recall that after this encoding process, the PN-code patternembedded in traffic is denoted as the attack mark. If weuse Ci ¼ <Ci;1; Ci;2; . . . ; Ci;L> 2 f�1;þ1gL to represent thePN-code and use �i ¼ <�i;1; �i;2; . . . ; �i;L> to represent theattack traffic rate, then we have �i;j ¼ V

2 Ci;j þ V2 . That is,

�i;j ¼ V if Ci;j ¼ þ1, and �i;j ¼ 0 if Ci;j ¼ �1 ðj ¼ ½1; L�Þ.Fig. 2 shows one example of the PN-code and thecorresponding attack traffic encoded with the PN-code.

2.3 Attack Traffic Decoding Stage

In this stage, the attacker takes the following two steps:1) The attacker queries the data center for the traffic reportdata, which consist of both the attack traffic and thebackground traffic. 2) From the report data, the attackerattempts to recognize the embedded attack mark. Theexistence of the attack mark determines whether thetargeted network is deployed with monitors or not. Asthe query of traffic report data is relatively straightforward,

here, we only detail the second step, i.e., attack markrecognition, as follows.

In the report data queried from the data center, the attacktraffic encoded with the attack mark is mixed with thebackground traffic, which is aggregated by the data center butnot generated by iLOC. It is critical for the iLOC attack toaccurately recognize the attack mark from the traffic reportdata. To address this, we develop a correlation-based scheme.This scheme is motivated by the fact that the original PN-code(used to encode attack traffic) and its corresponding attackmark (embedded in the traffic report data) are highlycorrelated: in fact, they are sharing the same pattern.

The attack mark in the traffic report data is the embeddedform of the original PN-code. The attack mark is similar toits original PN-code, although the background traffic mayintroduce interference and distortion into the attack mark.We adopt the following correlation degree to measuretheir similarity. Mathematically, the correlation degree isdefined as the inner product of two vectors. For twovectors X ¼ <X1; X2; . . . ; XL> and Y ¼ <Y1; Y2; . . . ; YL> oflength L, the correlation degree of vectors X and Y is

�ðX;Y Þ ¼PL

i¼1 XiYiL

; ð1Þ

where �ð:Þ represents the operator for the inner product of

the two vectors. Based on the above definition, we have

�ðX;XÞ ¼ �ðY ; Y Þ ¼ 1, 8 X, Y 2 f�1;þ1gL.We use two vectors, �i ¼ <�i;1; �i;2; . . . ; �i;L> and

!i ¼ <!i;1; !i;2; . . . ; !i;L> to represent the attack traffic

(embedded with the attack mark) and the background

traffic, respectively. We shift the above two vectors by

subtracting the mean value from the original data,

resulting in two new vectors, �0i ¼ <�0i;1; �0i;2; . . . ; �0i;L> and

!0i ¼ <!0i;1; !0i;2; . . . ; !0i;L>. We continue to use a vector

Ci ¼ <Ci;1; Ci;2; . . . ; Ci;L> 2f�1;þ1gL to represent the PN-

code. Thus, the correlation degree between the PN-code

and the (shifted) attack traffic can be obtained. Similarly,

we can also obtain the correlation degree between the PN-

code and the (shifted) background traffic as follows.

According to the rules of encoding attack traffic

discussed in Section 2.2.2, �i ¼ V2 Ci þ V

2 . Thus, �0i ¼ �i �Eð�iÞ ¼ �i � V

2 ¼ V2 Ci. Hence, the correlation degree be-

tween the original PN-code and the (shifted) attack traffic

is �ðCi; �0iÞ ¼ V2 �ðCi; CiÞ ¼ V

2 . Furthermore, we can also

derive the correlation degree between the PN-code and

the (shifted) background traffic, i.e., �ðCi; !0iÞ. The mean of

such a correlation degree is close to 0, since the PN-code has

low correlation with the (shifted) background traffic (i.e.,

E½�ðCi; !0iÞ� ¼ 1LE½

PLj¼1ð!0i;jCi;jÞ� � 0). If the standard devia-

tion of the background traffic rate is �x, the variance of such

a correlation degree is

V ar � Ci; !0i

� �� ¼ E � Ci; !

0i

� �� 0

� �2h i

ð2Þ

¼ 1

L2EXLj¼1

C2i;j!0i;j

2

" #ð3Þ


Fig. 2. PN-code and encoded attack traffic.

� 1

L2EXLj¼1

!0i;j2

" #¼ �x

2

L: ð4Þ

Thus, the correlation degree between the PN-code and the(shifted) background traffic is �ðCi; !0iÞ � �xffiffiffi

Lp . Based on the

above discussion, the attacker can choose appropriate attackparameters (e.g., PN-code lengthL and mark traffic rate V ) tomake the correlation degree ðV2Þ (between the PN-code andthe attack mark traffic) much larger than the correlationdegree ð �xffiffiffi

Lp Þ (between the PN-code and the background

traffic). As such, the attacker can accurately distinguish theattack mark traffic from the background traffic.

In the attack mark recognition, vector �i is used torepresent the queried report data, and vector �0i is used torepresent the shifted report data (by subtracting Eð�i;jÞ from�i). According to the above discussion, �0i ¼ �0i þ !0i (i.e.,report data include the attack traffic and the backgroundtraffic), or �0i ¼ !0i (i.e., report data include only the back-ground traffic). The attacker uses the correlation degreebetween �0i and the PN-code Ci, i.e., �ðCi; �0iÞ, to distinguishthe above two cases and determine the existence of a PN-code in the report data. If �ðCi; �0iÞ is larger than a thresholdTa,

1 which is referred to as the mark decoding threshold, thenthe attacker determines that the report contains attack trafficand the PN-code Ci and decides whether the target networkis deployed with monitors or not. The accuracy of thecorrelation-degree-based recognition scheme is analyzedand evaluated in Sections 3, 4, and 5.

Using real-world traces provided by SANs ISC [6], weshow the results of correlation degrees in Fig. 3 and the PDFof the correlation degrees in Fig. 4. We consider three typesof correlation degrees here. The first type is the correlationdegree between the PN-code and the queried report data(probe mark embedded) from the data center. This type ofcorrelation degree has a comparatively large value. We use aPN-code of length 20, and the probe mark traffic rate is equalto 0:7�x, where �x is the standard deviation of the back-ground traffic rate. The second type is the correlation degreebetween the PN-code and the background traffic. Thecorrelation degree in this type is much smaller in compar-ison with the one in the first type. The third type is thecorrelation degree between a randomly generated PN-codeand the queried traffic report data from the data center. Thissimulates the case that the defender uses a guessed PN-codeand attempts to recognize the probe mark generated by anattacker. We randomly generate 120 PN-codes with length 20

(instead of the original PN-code used to encode the attacktraffic). The results show that these randomly generatedcodes achieve a much smaller correlation degree with theprobe mark in comparison with the original PN-code. Thus,we know that the probe mark can be accurately recognizedonly by the attacker who knows the original PN-code.Notice that for the PN-code of length 20, the defender has avery small probability of 1=220 � 10�7 to correctly guess thePN-code used by an attacker.

2.4 Discussion

In order to accurately and effectively recognize the attackmark (PN-code) from the report data, we need to find thesegment of the report data containing the PN-code (i.e., weneed to fulfill the synchronization between the port-scantraffic report data and the PN-code). For this purpose, weintroduce a sliding-window-based scheme. The basic idea isto let the attacker obtain enough report data with smallgranularity. Then, a sliding window iteratively movesforward to capture a segment of the report data. For eachsegment, we apply the correlation-based scheme discussed inSection 2.3 to recognize whether the attack mark exists or not.The details of this synchronization is presented as 28. Asshown in Fig. 5, the attacker iteratively moves the slidingwindow forward. The attacker first sends a sequence ofqueries to the data center, and each query requests a portionof report data, which lasts for a given unit of time, known asquery duration Tq. To guarantee good synchronization andcapture each bit in the PN-code, Tq should be smaller than themark-bit duration Ts. Also, the attacker must send enoughqueries to ensure that the queried report data contains theentire attack mark and attack mark traffic. The attackeriteratively conducts a correlation test on the report data usinga sliding window. For example, in the ith round, the attackerselects ti as the starting time for the sliding window. In theðiþ 1Þth round, the attacker moves the sliding window one


1. The selection of Ta is impacted by not only the values of �0i and !0i butalso the desired attack accuracy, which is analyzed in Section 3.

Fig. 3. Correlation degree.Fig. 4. PDF of correlation degree.

Fig. 5. Sliding-window-based synchronization.

step (Tq) further, and the start time of the sliding windowbecomes ti þ Tq, and so on. In the ith round, a sequence ofdata (with a length of L) is obtained in the sliding window.The first data in the sequence is the traffic data in timeduration ½ti; ti þ Ts�, the second data in the sequence is thetraffic data in time duration ½ti þ Ts; ti þ 2Ts�, and so on. Withthis series of data, the attacker conducts the attack markrecognition procedure discussed earlier. The attacker repeatsthe attack mark recognition after each time, moving thesliding window forward until the attack mark is recognizedor the sliding window has gone through all the report data.According to (1), the computation complexity of one round ofcorrelation test is OðLÞ, where L is the PN-code length.Therefore, the computation complexity for performing thecorrelation test isOðTs=TqLÞ. Given such low complexity, thecorrelation test can be carried out in real time.

There is a trade-off in selecting the query duration Tq. Onone hand, if such a duration is smaller, although bettersynchronization accuracy can be achieved, the attacker needsto generate more queries to the data center, and moreiterations for the synchronization process is needed. On theother hand, if the duration is too large, the attacker might notbe able to correctly synchronize the probe mark, and theprobe mark recognition accuracy will be reduced. The impactof query duration on attack accuracy is shown in Section 3.

3 iLOC ATTACK ANALYSIS AND PARAMETER

DETERMINATION

Recall that there are some important parameters in theiLOC attack, including the mark traffic rate V , the markdecoding threshold Ta, the length of PN-code L, and themark-bit duration Ts. In this section, we first present ourformal analysis of the impact of different parameters onattack objectives. The analytical results are validated byempirical results presented in Section 5. Then, based onsuch analytical results, we further discuss how to determineattack parameters.

3.1 iLOC Attack Analysis

3.1.1 Attack Accuracy Analysis

In order to measure attack accuracy in terms of howcorrectly the attacker is able to recognize the probe markand identify monitor location, we introduce the followingtwo metrics. The first one is the attack success rate PAD, theprobability that an attacker correctly determines that aselected target network is deployed with monitors. Fromthe attacker’s perspective, the higher PAD is , the better theattack accuracy. The second metrics is the attack false-positiverate PAF , the probability that the attacker mistakenlydetermines a target network as one with monitors. Fromthe attacker’s perspective, the lower PAF is, the better theattack accuracy.

Recall that �ð:Þ represents the correlation degree operatorbetween two vectors of the same length L. Vector Ci ¼<Ci;1; Ci;2; . . . ; Ci;L> 2 f�1;þ1gL represents the PN-code.Vectors �i ¼ <�i;1; �i;2; . . . ; �i;L> and !i¼<!i;1; !i;2; . . . ; !i;L>represent (probe-mark-embedded) attack traffic and back-ground traffic, respectively. After subtracting the meanvalue from the original data, the two shifted vectors are�0i ¼ <�0i;1; �

0i;2; . . . ; �0i;L> and !0i ¼ <!0i;1; !

0i;2; . . . ; !0i;L>. Simi-

larly, we use vector�i to represent the queried report data andvector �0i to represent the shifted report data (by subtractingEð�0i;jÞ from �i). Assume that random variables !0i;1; . . . ; !0i;L

are independent and identically distributed (i.i.d.) and aredrawn from a Gaussian random distribution with standarddeviation �x. Note that real Internet port-scan traffic may notfollow the Gaussian distribution. In fact, to the best of ourknowledge, the traffic distribution of Internet port scans isstill an open problem and requires careful investigation.Here, we use Gaussian white noise as an example in ourtheoretical analysis to provide insights into the effectivenessof iLOC attacks. Our simulation data based on real-worldtraces validate our theoretical findings well. Recall that Ta isthe mark decoding threshold and V is the mark traffic rate.We have the following theorem for the iLOC attack accuracy.The detailed proof of this theorem can be found in theAppendix.

Theorem 1. In an iLOC attack, the attack success rate PAD is

PAD ¼ 1� Pr � �0i; Ci� �

� Taj �0i ¼ �0i þ !0i� ��

ð5Þ

¼ 1� 1ffiffiffi�p

Z1V2�Tað Þ ffiffiLpffiffi

2p

�x

e�y2

dy; ð6Þ

where �ð�0i; CiÞ¼�0i�Ci. The attack false-positive rate PAF is

PAF ¼ Pr � �0i; Ci� �

� Taj �0i ¼ !0i� ��

ð7Þ

¼ 1ffiffiffi�p

Z1ffiffiLp�Taffiffi

2p

�x

e�y2

dy: ð8Þ

Notice that given the background noise !0 drawn from theGaussian distribution, �ð�0; CiÞ can be approximated by aGaussian distribution as well. This can be reasoned asfollows: Based on (1), we have �ð�0; CiÞ ¼ �ð�0 þ !0; CiÞ ¼�ðV Ci þ !0CiÞ ¼ V þ �ð!0; CiÞ, and �ð!0; CiÞ can be approxi-mated by a Gaussian distribution.

We have a few observations from Theorem 1. First, theattack success rate PAD increases and the attack false-positive rate PAF decreases with the increasing PN-codelength L. Thus, better attack accuracy is achieved. Second,with the increasing mark traffic rate V , a better attacksuccess rate can be achieved as well.

3.1.2 Attack Invisibility Analysis

Here, attack invisibility refers to how invisible the iLOCattack is from the detection of the defender. In order toanalyze invisibility, we need to consider the detectionalgorithms. While there have been many different algo-rithms proposed to detect anomalies in port-scan traffic,here, we use a representative and generic algorithm thathas no specific requirement on detection systems and hasbeen widely adopted by many systems [2], [6], [27], [28]. Inthis algorithm, if the traffic rate (volume in a given timeduration) is larger than a predetermined threshold Td, thedefender detection threshold, the defender issues threat alertsand initiates reactions [6]. Such a detection threshold isusually obtained through statistical analysis of the back-ground traffic. Note that the threshold Td must be chosenfor anomaly detection, maintaining both the high detectionrate (the probability that an ongoing attack is detected) and


the low false-positive rate (the probability that an alarm istriggered when no attack is occurring).

To measure attack invisibility in terms of how well theiLOC attack can evade detection by the defender, we use thefollowing two metrics. The first one is the defender detectionrate PDD, the probability that the defender correctly detectsthe attack traffic introduced by the iLOC attack. The secondone is the defender false-positive rate PDF , the probability thatthe defender mistakenly identifies the attack traffic.

Similar to our approach in Section 2.2.2, we use randomvariable !0 to represent the shifted background traffic andrandom variable �0 to represent the shifted traffic datareported by the ITM system. Note that if no iLOC attackexists, �0 ¼ !0. If we assume that values of !0 at different timeunits are i.i.d. and follow a Gaussian random distributionwith standard deviation �x (i.e., !0 followsNð0; �2

xÞ), then wehave the following theorem for attack invisibility.

Theorem 2. In the iLOC attack, the defender detection ratePDD is

PDD ¼ 1� Pr �0 � Tdjð�0 ¼ V þ !0Þ½ � ð9Þ


Z1ðV�TdÞffiffi

2p

�x

e�y2

dy: ð10Þ

The defender false-positive rate PDF is

PDF ¼ Pr �0 � Tdjð�0 ¼ !0Þ½ � ð11Þ

¼ 1ffiffiffi�p

Z1Tdffiffi2p

�x

e�y2

dy: ð12Þ

The proof of Theorem 2 is similar to that of Theorem 1;therefore, we will skip it here due to space limitation. Noticethat in (9)-(12), our analysis for the detection algorithmassumes that �0 is measured and compared to Td every Ts,where Ts is the duration of 1 bit of a PN-code (also called themark-bit duration). In reality, the defender may not haveknowledge of Ts, and this assumption helps the worst caseattack analysis in terms of the attack invisibility. Note thatas researchers assume that the encryption algorithms areknown to attackers in cryptanalysis [29], we assume that thestrategy of mounting PN-code-modulated low-rate port-scan traffic and its parameters such as mark-bit duration Tsare known to the defender. This creates the worst-casesecurity analysis in our study. Even without knowledge ofTs, the defender can still develop adaptive strategies tocarry out anomaly detection. For example, based onhistorical traffic logs, the defender may build the trafficstatistic profile on different time durations. Then, thedefender measures traffic on different time durations andcompares them to the traffic statistic profile on thecorresponding time duration.

We have the following observations from Theorem 2.First, with the increasing mark traffic rate V , the defenderdetection rate PDD increases. Thus, the attack invisibilitywill be worsened. Second, the mark traffic rate V does notaffect the defender false-positive rate PDF , which is onlydetermined by the threshold Td configured by the defender.

As we mentioned earlier, the query duration Tq will also

affect attack accuracy. Recall that the recognition of the

probing mark is based on a sliding window, as discussed in

Section 2.3. The maximum synchronization error between the

PN-code and corresponding probe mark will be one query

duration Tq, as shown in Fig. 5. We know that the correlation

degree between the attack traffic and the PN-code is

�ðCi; �0iÞ ¼ Ci � C0i � V2 , where C0i is the result of shifting Ci

by time unit Tq, as shown in Fig. 5. Notice that Tq controls the

maximal synchronization error. Based on the correlation

degree defined in (1), we have Ci � C0i ¼ 1=LPL

j¼1 CijC0ij.

Since the overlapped area between Cij and C0ij is Ts � Tq, we

have CijC0ij ¼

Ts�TqTs¼ 1� Tq

Ts. Then, �ðCi; C0iÞ ¼ 1

LLð1�TqTsÞ,

and �ðCi; �0iÞ ¼ V2 ð1�

TqTsÞ. Similar to the proof of Theorem 1,

as shown in the Appendix, the attack success rate PAD

becomes

PAD ¼ 1� 1ffiffiffi�p

Z1V2

1�TqTsð Þ�Trð Þ ffiffiLpffiffi

2p

�x

e�y2

dy: ð13Þ

3.2 Determination of iLOC Attack Parameters

3.2.1 Determine V , Ta, and L

An attacker can use the above analytical results to determineattack parameters. First, the attacker can determine the marktraffic rate V . The reasons are the following: 1) V is onlyrelated to the attack invisibility metric (defender detectionrate PDD), and 2) V impacts the determination of otherparameters. Given the expected false-alarm rate, the attackercan also determine the mark decoding threshold Ta and thePN-code length L. Notice that the parameter for the back-ground traffic�x can be obtained through analyzing historicaltraffic data published by the data center of ITM systems.

We give the details of determining attack parameters asfollows: 1) Mark traffic rate: Using (12), the attacker can firstestimate the defender threshold Td, given a reasonableupper bound of defender false-positive rate PDF . Noticethat the Td should be selected to be larger than thebackground traffic �x. For example, using central limitationtheory, we know that Td ¼ 3�x achieves a reasonabledefender false-positive rate PDF (1.7 percent). Thus, wecan use 3�x as a reasonable estimation of Td. Given thedefender detection rate PDD, defender threshold Td, andbackground traffic �x, the attacker can determine the marktraffic rate V by resolving (10). 2) Mark recognition thresholdand length of PN-code: Given the mark traffic rate V(determined previously), attack false-positive rate PAF ,and attack success rate PAD, the attacker can furtherdetermine the mark decoding threshold Ta and L byresolving (6) and (7) in Theorem 1.

Based on the above discussion, we show the determina-tion results of attack parameters in Table 2. We determinethe mark decoding threshold Ta and the defender thresholdTd in order to derive a reasonable attack false-positiverate PAF and defender false-positive rate PDF (below1 percent). For instance, to achieve a 95 percent attacksuccess rate PAD and 5 percent defender detection ratePDD, we can use a PN-code of length L ¼ 20 and a probemark traffic rate of V ¼ 0:6�x. In Section 5, these numericalresults are validated by our empirical evaluations.


3.2.2 Determine TsTo determine the mark-bit duration Ts, the attacker needs toestimate the possible delay from the moment when attacktraffic is recorded by monitors to the moment when suchattack traffic is published by the data center. To make theiLOC attack effective, the mark-bit duration needs to be atleast as large as such a delay. Otherwise, the traffic in differentbit durations (each lasting Ts) may interfere with each otherand make it hard to recognize the probe mark. Severalpossible ways can be used to estimate the delay. For example,the attacker may obtain such information through publiclyavailable resources. Some ITM systems may publish suchinformation on their websites. The attacker may also activelyconduct experiments on ITM systems and measure the delay.For example, the attacker may install monitors and connectthem to the targeted ITM system. The attacker can simply usesuch monitors to report logs embedded with special patterns(e.g., PN-code) and keep querying the data center until theembedded traffic patterns are recognized. After repeating theabove process for a number of times, the attacker is able toderive the statistic profile of delay and then determine themark-bit duration Ts. We use this method in our implementa-tion of iLOC attack described in Section 4.

4 IMPLEMENTATION AND VALIDATION

In this section, we first introduce our implementation of aniLOC attack. Then, we report validation results of the iLOCdesign and implementation on a real-world ITM system.

4.1 Implementation of iLOC Attack

We implement an iLOC attack prototype based on thedesign in Section 2. Recall that an attacker has twoobjectives: attack accuracy and invisibility. This prototypeworks against any ITM system with a Web-based userinterface. There are five independent and importantcomponents in our iLOC implementation, as shown inFig. 6. Our iLOC is implemented in Microsoft MFC andMatlab on Windows XP OS. The five components aredescribed as follows:

1. Data Center Querist. This component interacts withthe data center of ITM systems. Its main tasks are tosend queries to the data center and to retrieveresponses from the data center. The inputs to thiscomponent are the URL or IP address of the datacenter and the port number to query. This compo-nent provides basic services to other components.

2. Background Traffic Analyzer. This component receivesthe data of background port-scan traffic on givenports via the Data Center Querist. With such data, thiscomponent obtains the statistic profile of back-ground traffic, e.g., standard deviation �x. The

profile is used to determine attack parameters forother components.

3. PN-code Generator. This component generates andstores a PN-code. The PN-code length is determinedaccording to the attacker’s objectives and the back-ground traffic profile in the way discussed inSection 3.2. Recall that we use the feedback shiftregister to generate the PN-code, as discussed inSection 2.2. The feedback shift register repeatedlygenerates a PN-code of length L.

4. Probe Traffic Generator. This component generatesattack traffic based on the PN-code and the statisticprofile of background traffic. With the profile of thebackground traffic, the attack traffic rate is deter-mined based on the method shown in Section 3.2.Then, the PN-code encoded traffic is generated in away as discussed in Section 2.2.2. The inputs to thiscomponent are the IP addresses of the targetnetwork, the port number, and the transportationprotocol (TCP or UDP).

5. Probe Mark Decoder. This component obtains theport-scan report data through the Data Center Queristand decides whether the probe mark exists in theway discussed in Section 2.3 or not. The PN-codeused in the decoding process is the same one used inencoding attack traffic and stored in the PN-codeGenerator. The decoding threshold is determined bythis component based on the attack accuracyrequirement and the background traffic profile, asexplained in Section 3.1.

4.2 Validation of iLOC Attack

The evaluation should be carried out over a real ITM systemin an ideal situation. Since an extensive experiment on a realITM system will affect its usability (e.g., generating skewedreports of the actual Internet traffic), in our evaluation, weconsidered both experiments with a real-world ITM systemand simulations using offline traffic traces. In order tovalidate our iLOC implementation, we carried out experi-ments with a real-world threat monitoring system, SANsISC, shown in Fig. 7. We deployed several monitors thatcollect port-scan logs of the monitored networks and reportdata to the data center of SANs ISC periodically (every halfhour). We launched the probing traffic addressed to thesetarget networks deployed with monitors and derived thefine-grained report by periodically requesting the portreport from the data center. Notice that if the monitors wetargeted report logs more frequently and the data center ofITM systems collects and publishes logs more frequently,we can obtain a port-scan report with finer granularity.


TABLE 2Determination of PN-Code Length and Mark Traffic Rate

(Denoted by ðL; V Þ)

Fig. 6. iLOC implementation components.

Fig. 7 illustrates our experimental setup. For thepurposes of this research, we requested information aboutthe locations of experimental monitors in this figure. Wewere provided with the identities of two networks A and B.There are some monitors in network A, and there is nomonitor in network B. The monitors in network A monitora set of IP addresses and log the port scans. We (theattacker) execute the iLOC attack to decide whethermonitors exist in network A and B, respectively.

In our experiment, we use a PN-code of length 15. Themark-bit duration is set at 1 hour. We use two machines. Onone machine, the Encoding Process sends attack traffic tonetworks A and B, respectively. On the other machine, theDecoding Process sends a query to the data center every20 minutes. With report data, we find that the DecodingProcess can correctly determine that network A is deployedwith monitors and network B is not deployed withmonitors. Fig. 8 shows the traffic rate in the time domain.Based on the data shown in the Fig. 8, we calculate thecorrelation degree. The correlation degree betweenthe mixed traffic and the PN-code is around 28, while thecorrelation value between the background traffic and thePN-code is around 8. Given that the detection thresholdis 14, we know that the attack traffic and background trafficcan be easily distinguished. Therefore, the attacker canaccurately identify that network A is deployed withmonitors and network B is not deployed with monitors.Fig. 9 shows the traffic rate in the frequency domain interms of the Power Spectrum Density (PSD). The PSDdescribes how the power of a time series data is distributedin the frequency domain. Mathematically, it is equal to theFourier transform of the autocorrelation of time series data[30]. From these two figures, we observe that it is hard forthe defender to detect an iLOC attack, since the overalltraffic with an iLOC attack is very similar to the trafficwithout iLOC attack traffic embedded. That is, suchexperiments demonstrate that the iLOC attack can effec-tively and stealthily identify monitors in reality.

5 PERFORMANCE EVALUATION

In this section, we conduct the performance evaluation bymerging simulated iLOC attack traffic into replayed real-world traffic traces.

5.1 Evaluation Methodology

5.1.1 Experiment Setup

In our evaluation, we use the real-world port-scan traces fromSANs ISC including the detail logs from 01/01/2005 to 01/15/2005 [6], [2].2 The traces used in our study contain morethan 80 million records, and the overall data volume exceeds80 Gbytes. We use these real-world traces as the backgroundtraffic. We merge records of simulated iLOC attack traffic intothese traces and replay the merged data to simulate the iLOCattack traffic. We evaluate different attack scenarios byvarying attack parameters such as the mark traffic rate V ,the length of PN-code L, and the number of parallel attacksessions N (on the same port). We report the results for thecases where attacks are launched to port 4321 (representingan unpopular port with low traffic rate) and ports 135 and 25(representing popular ports with high traffic rate). Experi-ments on other ports result in similar observations.

5.1.2 Evaluation Metrics

We explore both attack accuracy and invisibility to evaluateattack performance. For attack accuracy, we use twometrics: one is the attack success rate PAD, and the other isthe attack false-positive rate PAF , which are defined inSection 3.1.1. For attack invisibility, we use two metrics: oneis the defender detection rate PDD, and the other is thedefender false-positive rate PDF , which are defined inSection 3.1.2.

5.1.3 Evaluation Schemes

We evaluate the iLOC attack in comparison to two otherbaseline attack schemes. The first one is the attack thatlaunches a significantly high-rate of port-scan traffic totarget networks, as introduced in [18] and [9]. We denotethis attack as a volume-based attack. Notice that the techniqueused in our simulation for the volume-based attack is similarto the noise cancellation technique in [18]. However, thework in [18] did not provide much detail on how to choosethe noise cancellation factor. The second baseline schemeembeds the attack traffic with a 4unique frequency pattern.In this attack, the attack traffic rate changes periodically.Then, the attacker expects that the report data from the datacenter show such a unique frequency pattern if the selectedtarget network is deployed with monitors. We denote this


Fig. 7. Experiment setup.

Fig. 8. Background traffic versus traffic mixed with iLOC attack.

Fig. 9. PSD for background traffic versus traffic mixed with iLOC attack.

2. We thank the ISC for providing us valuable traces for this research.

attack scheme as a frequency-based attack. All three evaluatedattack schemes are listed in Table 3.

For fairness, we adjust the detection thresholds in allschemes so that the desired attack false-positive rate PAF anddefender false-positive rate PDF (below 1 percent) areachieved. For the iLOC attack, we generate different attacktraffic based on a variant PN-code length L (i.e., 15, 30,and 45). The default PN-code length is set to 30. To betterquantify the attack traffic rate for the iLOC attack and otherattack schemes, we use the normalized attack traffic rate P ,which is defined as P ¼ V =�x for an iLOC attack, where �xis the standard variation for the background traffic rate. Thedefault value of Tq ¼ 0:1Ts.

5.2 Evaluation Results

5.2.1 Attack Accuracy

To compare the attack accuracy of the iLOC attack with thatof volume- and frequency-based probe schemes, we plot theattack success rate PAD under different attack traffic rates(e.g., P ¼ ½0:01; 3�). Figs. 10, 11, and 12 show the results ondifferent ports. From these figures, we observe that bothiLOC and frequency-based attacks consistently achieve amuch higher attack success rate PAD than the volume-based scheme. This performance improvement is moresignificant when the attack traffic rate is lower. The reasoncan be explained as follows: For the iLOC scheme, the PN-code-based encoding/decoding makes the recognition ofprobe marks robust to interference from the backgroundtraffic. For the frequency-based scheme, the invariantfrequency in the attack traffic is also robust to the

interference from the background traffic. But the volume-based scheme relies on a high rate of attack traffic.

5.2.2 Attack Invisibility

To compare the attack invisibility of the iLOC attack with thatof the other two attack schemes, we show the defenderdetection rate PDD on different ports (e.g., 4321, 135, and 25)in Tables 4, 5, and 6. Each table shows the defender detectionrate PDD, given an attack success rate PAD (90 percent, 95percent, and 98 percent). Recall that the defender sets thedetection threshold to make the defender false-positive ratePDF below 1 percent. In all tables, “(Time)” and “(Freq)”mean that the defender adopts the time-domain and frequency-domain analytical techniques to detect attacks, respectively. Itis observed that our iLOC scheme consistently achieves amuch lower defender detection rate PDD than that of theother two schemes. Therefore, the iLOC attack achieves thebest attack invisibility performance. As expected, thedefender can easily detect the frequency-based attack, as aunique frequency pattern exists in attack traffic.

5.2.3 Impact of the Length of PN-Code

To investigate the impact of the PN-code length on theperformance of the iLOC attack, we show the attack successrate PAD for a PN-code of different lengths (e.g., 15, 30,and 45) in Fig. 13. Data are also collected for various


TABLE 3Probe Attack Schemes

Fig. 10. Attack success rate (port 4321).



TABLE 4Defender Detection Rate PDD (Port 4325)



attack traffic rates. In the legend, iLOCðL ¼ xÞ means thatthe PN-code length is x. This figure shows that the attacksuccess rate PAD increases with the increasing PN-codelength because a long PN-code reduces the interferencefrom the background traffic on recognizing the probe markand thereby improves attack accuracy.

5.2.4 Impact of the Number of Parallel Localization

Attacks

To evaluate the impact of the number of parallel localizationattacks on attack accuracy, we show the attack success ratePAD for avariety of parallel attacksessions onthe same port inFig. 14. In the legend, iLOCðN ¼ xÞ means that there are xparallel attack sessions. This figure shows that in terms of theattack success rate PAD, the iLOC attack scheme is notsensitive to the number of parallel attack sessions. The attacksuccess rate PAD only slightly decreases with the increasingnumber of parallel attack sessions. This is because the trafficfor different attack sessions is encoded by PN-codes, whichare lowly cross correlated (described in Section 2.2) andthereby have little interference. Fig. 15 shows the impact of thenumber of parallel attack sessions on attack invisibility. It canbe observed that the increasing number of parallel attacksessions results in a slight increase in the defender detectionrate PDD. Therefore, parallel attack capability can signifi-cantly improve the attack efficiency without compromisingthe effectiveness.

The iLOC attack achieves invisibility by using the PN-code, which causes a longer period for the iLOC attack thanthe ones in [18] and [19]. Nevertheless, parallel features ofiLOC attack can significantly improve attack efficiency. Inthe following, we provide one example to compare theefficiency of our attack with the one in [18] and [19]. Thisexample demonstrates that our attack is slower than the onein [18] and [19], and the parallel feature of our attack caneffectively reduce the performance gap between our attackand the one in [18] and [19]. Assume that a system that

consists of 1,200 networks is attacked. Using one port, thevolume-based attack needs 1,200 time units to perform theattack task. To fulfill the same attack task, iLOC with fourattack sessions in parallel using a code length of 15 canachieve the desired performance of attack accuracy andinvisibility, as shown in Fig. 14. In this case, the total timefor iLOC attack is 1;200 15=4 ¼ 4;500 units, which isaround four times that of the volume-based attack in [18]and [19].

5.2.5 Impact of Query Duration on Attack Accuracy

To investigate the impact of the query duration Tq on theiLOC attack accuracy, we show the attack success rate PAD

under different query durations ðTq ¼ ½0:05Ts; 0:3Ts�Þ inFig. 16. In the legend, iLOCðL ¼ xÞ refers to a PN-code oflength x. From this figure, we observe that with thedecreasing query duration Tq, the attack success rate PAD

increases. The reason is that a smaller query durationimproves synchronization granularity, and thus, the attackhas a better chance to recognize the probe mark. Hence, theattack accuracy will be improved. However, the smallerquery duration Tq will also increase the number of queriessent to the data center and the synchronization time forrecognizing the attack mark.

6 GUIDELINES OF COUNTERMEASURE

We have demonstrated the iLOC attack against ITMsystems. Let us discuss possible countermeasures againstsuch an attack. It is relatively easy to defend against thevolume-based and frequency-based localization attacks,which embed either a spike pattern (using high-rate scantraffic) [18], [19] or an invariable frequency pattern (usingthe attack embedded with a certain frequency pattern),since these two attack schemes show strong signatures inthe attack traffic (either in the time domain or in thefrequency domain). However, in order to defend against theiLOC attack, the defender needs insightful understanding of


Fig. 13. Attack success rate versus code length.

Fig. 14. Attack success rate versus number of parallel attack sessions

on the same port.

Fig. 15. Defender detection rate versus number of parallel attack

sessions on the same port.

Fig. 16. Attack accuracy versus query duration Tq.

the attack. We provide several general guidelines forcounteracting the iLOC attack from the following aspects.

6.1 Limiting the Information Access Rate

Recall that in the iLOC attack, the attacker must generate asignificant amount of queries to the data center of ITMsystems in order to accurately recognize the encoded attacktraffic. We may explore such knowledge to reduce theeffectiveness of an iLOC attack. To do so, the data centermay throttle the query request rate. One possible way is toenforce human/system interaction for each query and there-by eliminate the automatic query in the iLOC attack. This canbe conducted through authenticated registration, e.g., oneauthenticated registration is only valid for a certain number ofqueries. However, these limitations on the information accessrate may also reduce the usability of ITM systems.

6.2 Perturbing the Information

Recall that in the iLOC attack, the attacker needs torecognize the encoded attack traffic. Thus, the quality ofreports plays an important role in such a recognitionprocess. To reduce the effectiveness of an iLOC attack, wemay perturb the published report data by adding somerandom noise or randomizing the data publishing delay.This scheme is similar to the data perturbation in theprivate data sharing realm [32], [33], [34]. By perturbingreport data, the attack accuracy of an iLOC attack will bedegraded. However, adding random noise and randomiz-ing the delay in publishing report data will also affect thedata accuracy and usability of ITM systems. Studying sucha trade-off will be one aspect future work.

6.3 Investigating Advanced Detection Schemes

Recall that in the iLOC attack, in order to effectively evadedetection of monitors in ITM systems, the attacker has tocontinuously launch port-scan attack traffic to different targetnetworks to localize as many monitors as possible. Conse-quently, the target IP addresses of attack traffic may exhibit awidely dispersed distribution [35]. Thus, analyzing thedistribution of IP addresses may provide one possiblemethod of detection. Additionally, in [36], we proposed aninformation-theoretic framework to analyze iLOC attacks. Inparticular, we modeled the iLOC attack based on a commu-nication channel and derived closed formulas for the capacityof iLOC attacks. Based on this framework, we studied twodifferent kinds of iLOC attacks, which encode the probingtraffic in either the temporal domain (the scheme studied inthis paper) or the spatial domain (on multiple monitors). Wealso investigated the effectiveness of possible detectionstrategies, including centralized, distributed, and hybriddetection.

7 RELATED WORK

Many ITM systems have been developed and deployedsince CAIDA initiated the network telescope project tomonitor background traffic in 2001 [37]. The ITM system issimilar to the knowledge sharing of distributed intrusiondetection [38]. Although the IP addresses of monitorsthemselves can be protected by mechanisms such asencryption and Bloom filters [39], the public data reportedby these ITM systems could be used to disclose the IPaddress space covered by monitors. Existing attackapproaches achieve this by launching high-rate port-scantraffic [18], [19]. However, these kinds of attacks do not

consider the invisibility of attacks, since the high-rate attacktraffic exposes the attack.

The invisibility techniques in our work uses the camou-flage principle, as illustrated by nature and the military. Innature, an animal can disguise itself as the object on which itstands in order to fool itspredators orprey [40]. In the military,soldiers wear camouflage clothing designed to blend with thesurrounding terrain [41]. As an invisibility technique, ourwork leverages the PN-code technology and extends it to anew Internet cybersecurity realm. The PN-code was initiallyused in military communication systems to provide antijam-ming and secured communication [23]. In wireless commu-nication, the PN-code has been widely used to improvecommunication efficiency [24]. In addition, the PN-code hasother broad applications such as cryptography [42], secureddata storage and retrieving [43], and image processing [44].

Our work is related to robust watermarking. There is someresearch on how to design robust watermarking for specificapplications. For example, Li and Chang in [45] developed awatermarking scheme that allows the owner to publish alarge number of media files, provides the owner the ability todetect watermarks, and prevents the owner from cheating byambiguity attacks. Some research has focused on breakingdigital watermarks and developing countermeasures. Forexample, Arnold in [46] presented a classification of attacksagainst digital watermarks along with countermeasures.They categorized attacks into different categories such asremoving, desynchronization, and noise embedding. Brias-souli and Mouline in [47] evaluated the effects of adesynchronizing warp attack (e.g., time-varying delay) onthe performance of detecting watermarks. Liu and Subba-lakshmi in [48] proved that the worst case additive attack(deliberately adding noise to degrade the watermark detec-tion) against a watermark is a 3� � function (� is the distortioncompensation factor). There is other work related to the digitsteganography [49], [50], which intends to hide the presenceof information despite its practical relevance for digitalcontent (e.g., image and video) protection using water-marking and fingerprinting schemes.

Our work is also related to the covert channel. Variouscovert channels have been studied [51], [52], [53]. Forexample, JitterBugs is a class of inline interception mechan-isms that covertly transmit data by perturbing the timing ofinput events in order to affect externally observable networktraffic [52]. Takahshi and Lee in [54] assessed VoIP covertchannel threats that utilize an IP phone conversation toillicitly transfer information across the network. In our study,the sequence of attack traffic to the monitor, the transmissionof log information to the data center, and the transmission ofquery data back to the attacker forms a covert channel for theattacker to discover the location of monitors. Our workpresents a deep study of PN-code-based localization attacks,addressing both accuracy and secrecy.

In this paper, we study techniques in applying the PN-code in the iLOC attack. The work in [16] also studied how touse PN-code to effectively track anonymous flows throughmix networks. Since it is applied to a different problemdomain, the solution in [16] is significantly different from theone in this paper, including the use of the PN-code, designedalgorithms, decision rule, and theoretical analysis.

8 CONCLUSION

In this paper, we investigated a new class of attacks, i.e., theiLOC attack. It can accurately and invisibly localize monitorsof ITM systems. Its effectiveness is demonstrated by


theoretical analysis, simulations, and experiments with animplemented prototype. We believe that this paper lays thefoundation for ongoing studies of attacks that intelligentlyadapt attack traffic to avoid the detection by defense systems.Our study is critical for securing ITM systems. Since theattacker has a large space to improve the secrecy of the attack,the detection of such an invisible attack remains a challengingtask. A comprehensive study of other methods to protect thelocation of monitors is a part of our future work.

APPENDIX

PROOF OF THEOREM 1

i. Derivation of attack success rate PAD. The attacksuccess rate PAD is the probability that an attackercorrectly recognizes the fact whether a selectedtarget network is deployed with monitors. Followingthis definition, we have

PAD ¼ 1� Pr � �0i; Ci� �

� Taj �0i�0i þ !0i� ��

ð14Þ

¼ 1� Pr � �0i; Ci� �

� Ta �V

2j �0i ¼ !0i� ��

: ð15Þ

Then, PAD can be represented by

PAD ¼ 1�ffiffiffiffiLpffiffiffiffiffiffi2�p

�x

ZTa�V2�1

e�x2L

2�2x dx: ð16Þ

Let y2 ¼ x2L2�2

xand y ¼ x

ffiffiffiLpffiffi2p

�x. Then, we have

PAD ¼ 1�ffiffiffiffiLpffiffiffiffiffiffi2�p

�x

ZTa�V2ð Þ ffiffiLpffiffi2p

�x

�1

ffiffiffi2p

�xffiffiffiffiLp e�y

2

dy ð17Þ


Z1V2�Tað Þ ffiffiLpffiffi

2p

�x

e�y2

dy: ð18Þ

ii. Derivation of attack false-positive rate PAF . The attack

false-positive rate PAF is the probability that an

attacker mistakenly identifies a selected target net-

work as being deployed with monitors. If �ð�0i; CiÞfollows a Gaussian distribution Nð0; �2

x=LÞ (for

details, see (2) in Section 2.3), we have PAF ¼Pr½�ð�0i; CiÞ � Tajð�0i ¼ !0iÞ�. Thus, the PAF can be

presented by

PAF ¼ffiffiffiffiLpffiffiffiffiffiffi2�p

�x

Z1Ta

e�x2L

2�2x dx: ð19Þ

Let y2 ¼ x2L2�2

xand y ¼

ffiffiffiLp

xffiffi2p

�x. Then, we have

PAF ¼ffiffiffiffiLpffiffiffiffiffiffi2�p

�x

Z1ffiffiLp

Taffiffi2p

�x

e�y2

ffiffiffi2p

�xffiffiffiffiLp

dy ð20Þ

¼ 1ffiffiffi�p

Z1ffiffiLp

Taffiffi2p

�x

e�y2

dy:

ut

ACKNOWLEDGMENTS

The authors thank the anonymous reviewers for their

invaluable feedback. This work was supported in part by

the US National Science Foundation (NSF) under Grants

0808419, 0324988, 0546668, and 0721766. This work was also

supported in part by US Army Research Office (ARO) under

Grant AMSRD-ACC-R50521-CI. The authors would like to

acknowledge Ms. Larisa Archer for her dedicated editorial

help to improve the paper. A short conference version

appears in the Proceedings of the 27th IEEE International

Conference on Computer Communications (INFOCOM), Phoe-

nix, AZ, 13-18 April 2008.

REFERENCES

[1] D. Moore, C. Shannon, and J. Brown, “Code-Red: A Case Study onthe Spread and Victims of an Internet Worm,” Proc. Second InternetMeasurement Workshop (IMW ’02), Nov. 2002.

[2] D. Moore, V. Paxson, and S. Savage, “Inside the Slammer Worm,”IEEE Magazine of Security and Privacy, vol. 1, no. 4, pp. 33-39, 2003.

[3] W32/MyDoom.B Virus, http://www.us-cert.gov/cas/techalerts/TA04-028A.html, 2008.

[4] J. Mirkovic and P. Reiher, “A Taxonomy of DDOS Attack andDDOS Defense Mechanisms,” ACM SIGCOMM Computer Comm.Rev., vol. 34, no. 2, pp. 39-54, 2004.

[5] Internet Security News, http://www.landfield.com/isn/mail-archive/2001/Feb/0037.html, 2008.

[6] Internet Storm Center, SANS, http://isc.sans.org/, 2008.[7] D. Moore, G.M. Voelker, and S. Savage, “Inferring Internet

Deny-of-Service Activity,” Proc. 10th USENIX Security Symp.(SECURITY ’01), Aug. 2001.

[8] V. Yegneswaran, P. Barford, and S. Jha, “Global Intrusion Detectionin the DOMINO Overlay System,” Proc. 11th IEEE Network andDistributed System Security Symp. (NDSS ’04), Feb. 2004.

[9] V. Yegneswaran, P. Barford, and D. Plonka, “On the Design andUtility of Internet Sinks for Network Abuse Monitoring,” Proc.Sixth Int’l Symp. Recent Advances in Intrusion Detection (RAID ’03),Sept. 2003.

[10] D. Moore, “Network Telescopes: Observing Small or DistantSecurity Events,” Invited Presentation at the 11th USENIX SecuritySymp. (SECURITY ’02), Aug. 2002.

[11] Dynamic Graphs of the Nimda Worm, http://www.caida.org/dynamic /analysis/security/nimda, 2008.

[12] “myNetWatchman,” myNetWatchman Project, http://www.mynetwatchman.com, 2008.

[13] L. Spitzner, Know Your Enemy: Honeynets. Honeynet Project,http://project.honeynet.org/papers/honeynet, 2008.

[14] N. Provos, “Honeyd—A Virtual Honeypot Daemon,” Proc. 10thDFN-CERT Workshop, Feb. 2003.

[15] J. Twucrpss and M.M. Williamson, “Implementing and Testing aVirus Throttling,” Proc. 12th USENIX Security Symp. (SECUR-ITY ’03), Aug. 2003.

[16] W. Yu, X. Fu, S. Graham, D. Xuan, and W. Zhao, “DSSS-BasedFlow Marking Technique for Invisible Traceback,” Proc. IEEESymp. Security and Privacy (S&P ’07), May 2007.


[17] V. Sekar, Y. Xie, D. Maltz, M. Reiter, and H. Zhang, “Toward aFramework for Internet Forensic Analysis,” Proc. Third WorkshopHot Topics in Networks (HotNets-III ’04), Nov. 2004.

[18] J. Bethencourt, J. Frankin, and M. Vernon, “Mapping InternetSensors with Probe Response Attacks,” Proc. 14th USENIX SecuritySymp. (SECURITY ’05), July/Aug. 2005.

[19] Y. Shinoda, K. Ikai, and M. Itoh, “Vulnerabilities of PassiveInternet Threat Monitors,” Proc. 14th USENIX Security Symp.(SECURITY ’05), July/Aug. 2005.

[20] L.Y. Chuang, C.H. Yang, C.H. Yang, and S.L. Lin, “An InteractiveTraining System for Morse Code Users,” Proc. Sixth IASTED Int’lConf. Internet and Multimedia Systems and Applications, Aug. 2002.

[21] R. Naraine, Botnet Hunters Search for Command and Control Servers,http://www.eweek.com/article2/0,1759,1829347,00.asp, 2008.

[22] Dshield, Distributed Intrusion Detection System, http://www.dshield.org/, 2008.

[23] R.K. Pickholtz, D.L. Schilling, and L.B. Milstein, “Theory ofSpread-Spectrum Communication—Tutorial,” IEEE Trans. Comm.,vol. 30, no. 5, pp. 855-884, 1982.

[24] E.J. Crusellers, M. Soriano, and J.L. Melus, “Spreading CodesGenerator for Wireless CDMA Network,” Int’l J. Wireless PersonalComm., vol. 7, no. 1, 1998.

[25] R. Dixon, Spread Spectrum Systems, second ed. John Wiley & Sons,1984.

[26] Nova Engineering, Linear Feedback Register Shift, http://www.sss-mag.com/pdf/lfsr.pdf, 2008.

[27] S. Venkataraman, D. Song, P. Gibbons, and A. Blum, “NewStreaming Algorithms for Superspreader Detection,” Proc. 12thIEEE Network and Distributed Systems Security Symp. (NDSS ’05),Feb. 2005.

[28] S. Staniford, V. Paxson, and N. Weaver, “How to Own theInternet in Your Spare Time,” Proc. 11th USENIX SecuritySymp. (SECURITY ’02), Aug. 2002.

[29] Cryptanalysis, http://en.wikipedia.org/wiki/Cryptanalysis, 2008.[30] R.L. Allen and D.W. Mills, Signal Analysis: Time, Frequency, Scale,

and Structure. John Wiley & Sons, 2004.[31] X. Fu, Y. Zhu, B. Graham, R. Bettati, and W. Zhao, “On Flow

Marking Attacks in Wireless Anonymous Communication Net-works,” Proc. 24th Int’l Conf. Distributed Computing Systems(ICDCS ’04), Mar. 2004.

[32] N. Zhang, S. Wang, and W. Zhao, “A New Scheme on PrivacyPreserving Association Rule Mining,” Proc. Eighth European Conf.Principles and Practice of Knowledge Discovery in Databases(PKDD ’04), Sept. 2004.

[33] R. Agrawal, A. Evfimievski, and R. Srikant, “Information Sharingacross Private Databases,” Proc. ACM SIGMOD ’03, July 2003.

[34] N. Zhang and W. Zhao, “Privacy-Preserving Data-MiningSystems,” Computer, vol. 40, no. 4, Apr. 2007.

[35] A. Lakhina, M. Crovella, and C. Diot, “Mining Anomalies UsingTraffic Feature Distribution,” Proc. ACM SIGCOMM ’05, Aug. 2005.

[36] W. Yu, N. Zhang, X. Fu, R. Bettati, and W. Zhao, “On LocalizationAttacks to Internet Threat Monitors: An Information-TheoreticFramework,” Proc. IEEE Int’l Conf. Dependable Systems andNetworks (DSN) (Performance and Dependability Symp.—PDS ’08),June 2008.

[37] CAIDA, Telescope Analysis, http://www.caida.org/analysis/security/telescope, 2008.

[38] H. Debar and A. Wespi, “Aggregation and Correlation ofIntrusion-Detection Alerts,” Proc. Fourth Int’l Symp. RecentAdvances in Intrusion Detection (RAID ’01), Oct. 2001.

[39] P. Gross, J. Parekh, and G. Kaiser, “Secure Selecticast forCollaborative Intrusion Detection Systems,” Proc. Third Int’lWorkshop Distributed Event-Based Systems (DEBS ’04), May 2004.

[40] A. Anderson, A. Johnston, and P. McOwan, Motion Illusions andActive Camouflaging, http://www.ucl.ac.uk/ucbplrd/motion/motion_middle.html, 2008.

[41] Chief of Engineers, United States Army: Army Facilities Compo-nents System User Guide, http://www.usace.army.mil/inet/usace-docs/armytm/tm5-304/, Oct. 1990.

[42] M. Bellare, S. Goldwasser, and D. Miccianciom, “Pseudo-Random Number Generation within Cryptographic Algo-rithms: The DSS Case,” Proc. 17th Ann. Int’l Cryptology Conf.(CRYPTO ’97), May 1997.

[43] L. Wang and B.B. Hirsbrunner, “PN-Based Security Design forData Storage,” Proc. IASTED Int’l Conf. Databases and Applications(DBA ’04), Feb. 2004.

[44] X.G. Xia, C.G. Boncele, and G.R. Arce, “A MultiresolutionWatermark for Digital Images,” Proc. Int’l Conf. Image Processing(ICIP ’97), Oct. 1997.

[45] Q.M. Li and E.C. Chang, “Zero-Knowledge Watermark DetectionResistant to Ambiguity Attacks,” Proc. Eighth ACM WorkshopMultimedia and Security (MMSEC ’06), Sept. 2006.

[46] M. Arnold, “Attacks on Digital Audio Watermarks and Counter-measures,” Proc. Third IEEE Int’l Symp. Web Delivering of Music(WEDELMUSIC ’03), Sept. 2003.

[47] A. Briassouli and P. Moulin, “Detection-Theoretic Analysis ofWarping Attacks in Spread-Spectrum Watermarking,” Proc. IEEEInt’l Conf. Acoustics, Speech, and Signal Processing (ICASSP ’03),Apr. 2003.

[48] N. Liu and K.P. Subbalakshmi, “Worst Case Attack on Quantiza-tion Based Data Hiding,” Proc. Eighth IEEE Int’l Symp. Multimedia(ISM ’06), Dec. 2006.

[49] C. Cachin, Digital Steganography, http://www.zurich.ibm.com/~cca/papers/encyc.pdf, 2005.

[50] N. Provos, “Defending against Statistical Steganalysis,” Proc. 10thUSENIX Security Symp. (SECURITY ’01), Aug. 2001.

[51] S. Cabuk, C. Brodley, and C. Shields, “IP Covert Timing Channels:Design and Detection,” Proc. 11th ACM Conf. Computer and Comm.Security (CCS ’04), Oct. 2004.

[52] G. Shah, A. Molina, and M. Blaze, “Keyboards and CovertChannels,” Proc. 15th USENIX Security Symp. (SECURITY ’06),July/Aug. 2006.

[53] D. Bailey, D. Boneh, E.-J. Goh, and A. Juels, “Covert Channels inPrivacy-Preserving Identification Systems,” Proc. ACM Conf.Computer and Comm. Security (CCS ’07), Nov. 2007.

[54] T. Takahashi and W. Lee, “An Assessment of VoIP CovertChannel Threats,” Proc. IEEE Int’l Conf. Security and Privacy inComm. Networks (SecureComm ’07), Sept. 2007.

Wei Yu received the BS degree in electricalengineering from the Nanjing University ofTechnology in 1992, the MS degree in elec-trical engineering from Tongji University in1995, and the PhD degree in computerengineering from Texas A&M University in2008. He worked for Cisco Systems Inc.,Richardson, Texas, since 2001. His researchinterests include cyberspace security, computernetworks, and distributed systems. He is a

member of the IEEE Computer Society.

Xun Wang received the BS and MS degrees incomputer engineering from the East ChinaNormal University, Shanghai, in 1999 and2002, respectively, and the PhD degree incomputer science and engineering from OhioState University in 2007. He has been workingfor Cisco Systems, Inc., Richardson, Texas,since 2007. His research interests include net-work security, overlay networks, and wirelesssensor networks.

Xinwen Fu received the BS degree in electricalengineering from Xi’an Jiaotong University,China, in 1995, the MS in electrical engineeringfrom the University of Science and Technology ofChina in 1998, and the PhD degree in computerengineering from Texas A&M University in 2005.From 2005 to 2008, he was an assistantprofessor with the College of Business andInformation Systems, Dakota State University.In summer 2008, he joined the University of

Massachusetts, Lowell, as a faculty member and is currently an assistantprofessor in the Department of Computer Science. His current researchinterests are in network security and privacy. He is a member of the IEEEComputer Society.


Dong Xuan received the BS and MS degrees inelectronic engineering from Shanghai Jiao TongUniversity (SJTU), China, in 1990 and 1993,respectively, and the PhD degree in computerengineering from Texas A&M University in 2001.Currently, he is an associate professor in theDepartment of Computer Science and Engineer-ing, Ohio State University. He is a recipient of theUS National Science Foundation (NSF) CA-REER award. His research interests include

distributed computing, computer networks, and cyberspace security.He is a member of the IEEE Computer Society.

Wei Zhao completed the undergraduate pro-gram in physics at Shaanxi Normal University,Xian, China, in 1977 and received the MS andPhD degrees in computer and informationsciences from the University of Massachusetts,Amherst, in 1983 and 1986, respectively. Sincethen, he has served as a faculty member atAmherst College, the University of Adelaide, andTexas A&M University. Between 2005 and 2006,he served as the director for the Division of

Computer and Network Systems in the US National Science Founda-tion. He is currently the rector of the University of Macau, Macau, China.Before joining the University of Macau, he served as the dean of theSchool of Science, Rensselaer Polytechnic Institute. He has madesignificant contributions in distributed computing, real-time systems,computer networks, and cyberspace security. He is a fellow of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


Date post:	30-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …xinwenfu/paper/Journals/09_TPDS_20_11... · An...

Documents