562 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 3 ... · Seong Soo Kim and A. L. Narasimha...

562 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 3, JUNE 2008

Statistical Techniques for Detecting Traffic AnomaliesThrough Packet Header Data

Seong Soo Kim and A. L. Narasimha Reddy, Senior Member, IEEE

Abstract—This paper proposes a traffic anomaly detector,operated in postmortem and in real-time, by passively monitoringpacket headers of traffic. The frequent attacks on network infra-structure, using various forms of denial of service attacks, haveled to an increased need for developing techniques for analyzingnetwork traffic. If efficient analysis tools were available, it couldbecome possible to detect the attacks, anomalies and to takeaction to contain the attacks appropriately before they have hadtime to propagate across the network. In this paper, we suggesta technique for traffic anomaly detection based on analyzingcorrelation of destination IP addresses in outgoing traffic at anegress router. This address correlation data are transformed usingdiscrete wavelet transform for effective detection of anomaliesthrough statistical analysis. Results from trace-driven evaluationsuggest that proposed approach could provide an effective meansof detecting anomalies close to the source. We also present amultidimensional indicator using the correlation of port numbersand the number of flows as a means of detecting anomalies.

Index Terms—Egress filtering, network attack, packet header,real-time network anomaly detection, statistical analysis of net-work traffic, time series of address correlation, wavelet-basedtransform.

I. INTRODUCTION

THE frequent attacks on network infrastructure, using var-ious forms of denial of service (DoS) attacks and worms,

have led to an increased need for developing techniques for ana-lyzing and monitoring network traffic. If efficient analysis toolswere available, it could become possible to detect the attacks,anomalies and take action to suppress them before they havehad much time to propagate across the network. In this paper,we study the possibilities of traffic-analysis based mechanismsfor attack and anomaly detection.

The motivation for this work came from a need to reduce thelikelihood that an attacker may hijack the campus machines tostage an attack on a third party. A campus may want to preventor limit misuse of its machines in staging attacks, and possiblylimit the liability from such attacks. In particular, we study theutility of observing packet header data of outgoing traffic, suchas destination addresses, port numbers and the number of flows,in order to detect attacks/anomalies originating from the campusat the edge of a campus.

Detecting anomalies/attacks close to the source allows usto limit the potential damage close to the attacking machines.Traffic monitoring close to the source may enable the network

Manuscript received June 9, 2004; revised December 22, 2005, September30, 2006 and May 4, 2007; approved by IEEE/ACM TRANSACTIONS ON

NETWORKING Editor N. Taft. This work was supported in part by the NationalScience Foundation under Grant ANI-0087372, the Texas Higher EducationBoard, the Texas Information Technology and Telecommunications Taskforceand Intel Corporation.

S. S. Kim is with the Digital Media R&D Center, Samsung Electronics Co.,Ltd., Gyeonggi-do 443–742, Korea (e-mail: [email protected]).

A. L. N. Reddy is with the Department of Electrical and Computer Engi-neering, Texas A&M University, College Station, TX 77843-3128 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/TNET.2007.902685

operator quicker identification of potential anomalies and allowbetter control of administrative domain’s resources. Attackpropagation could be slowed through early detection.

Our approach passively monitors network traffic at regularintervals and analyzes it to find any abnormalities in the ag-gregated traffic. By observing the traffic and correlating it toprevious states of traffic, it may be possible to see whether thecurrent traffic is behaving in a similar (i.e., correlated) manner.The network traffic could look different because of flash crowds,changing access patterns, infrastructure problems such as routerfailures, and DoS attacks. In the case of bandwidth attacks, theusage of network may be increased and abnormalities may showup in traffic volume. Flash crowds could be observed throughsudden increase in traffic volume to a single destination. Suddenincrease of traffic on a certain port could signify the onset ofan anomaly such as worm propagation. Our approach relies onanalyzing packet header data in order to provide indications ofpossible abnormalities in the traffic.

Our approach to detecting anomalies envisions two kinds ofdetection mechanisms, i.e., postmortem and real-time modes. Apostmortem analysis may exploit many hours of traffic data as asingle data set, employing more rigorous, resource-demandingtechniques for analyzing traffic. Such an analysis may be usefulfor traffic engineering purposes, analysis of resource usage, un-derstanding peak demand, etc. On the other hand, real-time anal-ysis would concentrate on analyzing a small window of trafficdata with a view to provide a quick and possibly dirty warning ofimpending/ongoing traffic anomalies. Real-time analysis mayrely on less sophisticated analysis because of the resource de-mands and imminence of attacks.

Previous work has shown that a postmortem analysis of trafficvolume (in bytes) can reveal changes in traffic patterns [1], [3],[23]. FlowScan [40] and AutoFocus [39] are used as off-linetraffic analyzing tools. Rigorous real-time analysis is requiredfor detecting and identifying the anomalies so that mitigationaction can be taken as promptly as possible. In this paper, wealso study the effectiveness of such analysis in real-time anal-ysis of traffic data. Real-time analysis may enable us to pro-vide means of online detection of anomalies while they are inprogress. Real-time analysis may employ smaller amounts ofdata in order to keep such analysis simple and efficient. At thesame time, the data cannot be so small that meaningful statis-tical conclusions cannot be drawn. However, real-time analysismay also require that any indications of attacks or anomalies beprovided with short latencies. This tension between robustnessand latency makes real-time analysis more challenging.

The tools that collect and process large amounts of flow datamay not scale to high-speed links as they focus on individualflow behavior. Our approach tries to look at aggregate packetheader data in order to improve scalability and to effectivelydeal with anomalies such as DDoS where individual attack flowsmay not look anomalous.

1063-6692/$25.00 © 2008 IEEE

KIM AND REDDY: STATISTICAL TECHNIQUES FOR DETECTING TRAFFIC ANOMALIES THROUGH PACKET HEADER DATA 563

Our paper makes the following contributions: a) considersthe time series analysis of different packet header data otherthan traffic volume; b) proposes simple and efficient mecha-nisms for collecting and analyzing aggregated data in real-time;and c) shows that the proposed signals are more effective in de-tecting attacks than the analysis of traffic volume alone.

The rest of the paper is organized as follows. Section II givesan overview of related work. Section III discusses our approachand methodology. Section IV discusses the use of address corre-lation as a traffic signal. Section V describes the wavelet-basedtransform of the acquired correlation signal. Sections VI andVII illustrate the simulation results of two kinds of detectionmechanisms, i.e., postmortem and real-time analysis on the realand simulated traffic. In Section VIII, multidimensional indi-cators that also exploit port numbers and the number of flowsare introduced. Section IX discusses comparison with IDS andSection X concludes the paper.

II. RELATED WORK

Many approaches have been studied to detect, prevent andmitigate malicious network traffic. For example, rule-basedapproaches, such as IDS (intrusion detection system), try toapply previously established rules against incoming traffic todetect and identify potential DoS attacks close to the victim’snetwork. To cope with novel attacks, however, IDS tools such asSnort [41] require to be updated with the latest rules. This paperlooks at the problem of designing generalized measurement-based real-time detection mechanisms. Measurement-basedstudies have considered traffic volume [3], [38], [43], numberof flows [40] as potential signals that can be analyzed in orderto detect anomalies in network traffic, while we further treatthe traffic headers such as addresses and port numbers. Work in[43] relies on input data from multiple sources (i.e., all links ina network), while our work focuses on a single link at a time.

Some approaches proactively seek methods to suppressthe overflow of traffic at the source [5]. Controls based onrate limits have been adopted for reducing the monopolisticconsumption of available bandwidth, to diminish the effects ofattacks, either at the source or at the destination [5], [7], [12].The apparent symptoms of bandwidth attack may be sensedthrough monitoring bit rates [10] and/or packet counts of thetraffic flow. Bandwidth accounting mechanisms have been sug-gested to identify and contain attacks [8], [9], [11], [13], [14],[42]. Packeteer [25] and others offer commercial products thatcan account traffic volume along multiple dimensions and allowpolicy-based rate control of bandwidth. Pushback mechanismshave been proposed to contain the detected attacks closer to thesource [11], [12], [26]. Traceback has been proposed to tracethe source of DDoS attacks even when the source addressesmay be spoofed by the attacker [27].

However, sophisticated low-rate attacks [37], which do notgive rise to noticeable variance in traffic volume, could go un-detected when only traffic volume is considered. Recently statis-tical analysis of aggregate traffic data has been studied. In gen-eral, the generated signal can be analyzed by employing tech-niques such as FFT (Fast Fourier Transform) and wavelet trans-forms. FFT of traffic arrivals may reveal inherent flow level in-formation through frequency analysis. Fourier transforms andwavelets have been applied to network traffic to study its peri-odicity [23], [30]. Our previous work in [1] and the work in [3]studied traffic volume as a signal for wavelet analysis and these

Fig. 1. The various filtering points. Our approach is based on the outboundtraffic at the source.

earlier studies have considerably motivated our current study.Our study builds on this earlier work and extends the statisticalanalysis of traffic data further in analyzing other packet headerdata, such as addresses and port numbers in real-time.

Various forms of signatures have been traditionally utilizedfor representing the contents or identities of documents [17].This earlier work motivated the representation of aggregate net-work traffic data in a compact data structure. A similar datastructure was employed in [4], with significant differences inprocessing of collected data, detection mechanisms and the re-sulting traffic anomaly detectors. The structure of addresses atvarious points in the network was observed to be multi-fractalin [6].

III. OUR APPROACH

A. Traffic Analysis at the Source

We focus on analyzing the traffic at an egress router. Moni-toring traffic at a source network enables early detection of at-tacks, to control hijacking of AD (administrative domain, e.g.,campus) machines, and to limit the squandering of resources.

There are two kinds of filtering based on traffic controllingpoint as shown in Fig. 1. Ingress filtering protects the flow oftraffic entering into an internal network under administrativecontrol. Ingress filtering is typically performed through firewallor IDS rules to control inbound traffic originated from the publicInternet. On the other hand, egress filtering controls the flowof traffic leaving the administered network. Thus, internal ma-chines are typically the origin of this outbound traffic in viewof an egress filter. As a result, the filtering is performed at thecampus edge [19]. Outbound filtering has been advocated forlimiting the possibility of address spoofing, i.e., to make surethat source addresses correspond to the designated addresses forthe campus. With such filtering in place, we can focus on des-tination addresses and port numbers of the outgoing traffic foranalysis purposes.

Our approach is based on the following observations: the out-bound traffic from an AD is likely to have a strong correlationwith itself over time since the individual accesses have strongcorrelation over time. Recent studies have shown that the trafficcan have strong patterns of behavior over several timescales[3] and a time series of packet bytes per time slot are not in-dependent but indeed rather strongly correlated [35]. It is pos-sible to infer that some correlation exists on the traffic’s weeklyor daily consumption patterns. We hypothesize that the desti-nation addresses will have a high degree of correlation for anumber of reasons: (i) popular web sites, such as yahoo.comand google.com, are shown to receive a significant portion ofthe traffic, (ii) individual users are shown to access similar websites over time due to their habits, and (iii) long-term flows, suchas ftp download and video accesses, tend to correlate addresses


Fig. 2. The block diagram of our detector.

over longer timescales. If this is the case, sudden changes in cor-relation of outgoing addresses can be used to detect anomaliesin traffic behavior.

B. General Mechanism of the Detector

Our detection mechanisms can be explained in three majorsteps shown in Fig. 2. The first step is a traffic parser, in whichthe correlation signal is generated from packet header traces orNetFlow records as input. In this step, the network traffic is firstfiltered to produce a signal that can be analyzed. So far, we havediscussed how correlation of destination addresses may be usedas a potential signal. Fields in the packet header, such as destina-tion addresses and port numbers, and traffic volume dependingon the nature of the traffic, can be used as a signal. Packet headerdata, due to its discrete nature, poses interesting problems foranalysis as discussed later. Sampling and aggregation may beused to reduce the amount of data at this stage as explained inSection IV.

The second step involves data transformation for statisticalanalysis. In this paper, we employ wavelet transforms to studythe address and port number correlation over several timescales.Analyzing discrete domains such as address spaces and portnumbers poses interesting problems for wavelet analysis. Wepropose to employ correlations of the different domains to gen-erate suitable signals for analysis. We employ both selective andcomprehensive reconstruction of decomposed correlation sig-nals across different timescales. Our wavelet analysis and de-tection mechanisms are explained in Section V.

The final stage is detection, in which attacks and anomaliesare monitored using thresholds. The analyzed informationwill be compared with historical thresholds of traffic to seewhether the traffic’s characteristics are out of regular norms.Outliers in the analyzed signal are expected to indicate anom-alies. This comparison will lead to some form of a detectionsignal that could be used to alert the potential anomalies in thenetwork traffic. We report on our results employing correlationof destination addresses, port numbers and the distributionof the number of flows as monitored traffic signals. In thispaper, we consider some statistical summary measures of thereconstructed traffic signal and apply thresholds to the samplevariances, as explained in Sections VI, VII, and VIII.

C. Traces

To verify the validity of our approach, we run our algorithmon four traces of network traffic.

First, we examine our method on traces from the Universityof Southern California that contain real network attacks [38].Second, to inspect the performance of our detector on backbonelinks, we examine the mechanism on KREONet2 traces, whichinclude over 230 organizations, from July 21, 2003, to July 28,2003, that contain real worm attacks [31]. In the trace employed,there were three major attacks and a few instantaneous probeattacks, which were judged by various forensic traffic analysesin advance. Third, to compare our method with Snort, we exploita live network in Texas A&M University.

Fig. 3. Comparison among signals of (a) autocorrelation coefficient, (b) full32-bit correlation and (c) the compact data structure. For computational effi-ciency, we employ signal (c) here without compromising performance.

Fourth, to evaluate the sensitivity of our detector’s perfor-mance over attacks of various configurations, we employ theattack-free traces from the NLANR (National Laboratory forApplied Network Research) [2], which are later superimposedwith simulated virtual attacks. We employ Auckland-IV tracescollected at the University of Auckland in length from 3 days toseveral weeks, which show diurnal variations and a few instanta-neous peaks1 in traffic as shown in Fig. 3, but do not include ex-plicit network attacks. The attack-free trace is used as a basis forstatistical analysis of ambient traffic. It is expected that the realattacks and various synthesized attacks give a more comprehen-sive understanding of the capabilities of the proposed approach.

IV. SIGNAL GENERATION

A. The Weighted Correlation

Our approach collects packet header data at an AD’s edgeover a sampling period. Individual fields in the packet headerare analyzed to observe anomalies in the traffic. Individualfields take discrete values and show discontinuities in thesample space. For example, IP address space can span 2 pos-sible addresses and addresses in a sample are likely to exhibitmany discontinuities over the space. In order to overcome suchdiscontinuities over a discrete space, we convert packet headerdata into a continuous signal through correlation of samplesover successive samples.

To investigate the ensembles of a random process statistically,a correlation coefficient, which is a normalized measure of thestrength of the linear relationship between random variables, isusually employed [32], [33]. For each address, , in the traffic,we count the number of packets, , sent in the sampling in-stant, . We can define IP address correlation coefficient signalat a sampling point as follows:

(1)

where and are the mean values of packet counts inand .

1The peaks in the early points of every day, e.g., sampling points near 360and 720, turned out to be regular flash crowds included in the original traces.


Fig. 4. The autocorrelation function of the correlation coefficient signal (i.e.,Fig. 3(a)) over 2 days. We could infer that the traffic has a close positive addresscorrelation.

Fig. 3(a) shows the IP address correlation coefficient signal ofthe ambient NLANR traces over 3 days by (1), which illustratesa close positive correlation between adjacent samples. More-over, we analyze the similarity over time using the autocorrela-tion function of the above signal as shown in Fig. 4. If the levelof aggregation in the flow level and sampling duration are highsuch that spontaneous changes in the traffic get buried, we couldassume that outbound traffic have a high degree of correlationover time [35].

In this paper, we employ a simplified correlation of time se-ries for computational efficiency without compromising perfor-mance. This weighted correlation signal generation phase fordestination addresses is explained below and shown in Fig. 3(b).In order to compute full 32-bit address correlation signal, weconsider two neighboring sampling instants. We define addresscorrelation signal at sampling point as

(2)

If an address spans the two sampling points and , wewill obtain a positive contribution to . A popular destina-tion address contributes more to than an infrequentlyaccessed destination, since we consider the number of packetsbeing sent to the identical address.

B. Data Structure for Computing Correlation

In order to minimize storage and processing complexity, weemploy a simple but powerful data structure. In general, we de-compose an -bit domain into 8-bit fields. We keep trackof an element in such an -bit domain by keeping track of theelement in the smaller fields. We illustrate this with ad-dresses as an example. This data structure, named , con-sists of four arrays “ ”. Each array expresses one of thefour fields in an IP address. Within each array, we have 256 lo-cations, for a total of . A location isused to record the packet count for the address in the th fieldof the IP address. This provides a concise description of the ad-dress instead of 2 locations that would be required to store theaddress occurrence uniquely. We filter this signal by computinga correlation of the address in two successive samples, i.e., bycomputing

(3)

Fig. 5 depicts the data structure that consists of the 2-D array. The first dimension array corresponds to the 4 byte

segments of the IP address. The second dimension indicates the256 entries of each IP address byte segment, and is expressed asthe 256 columns in each row.

Fig. 5. The compact data structure for computing weighted correlation. Sup-pose that only five flows exist, their source (or destination) IP addresses andpacket counts are as follows.

IP address of Flow � � ��, Packet count of Flow � � �;IP of F� � ��, P� � �;IP of F� � ��, P� � �;IP of F� � ��, P� � ��;IP of F� � ��, P� � �

All entries in the count arrays are initialized to zeros. Eachpacket’s header data are recorded to the corresponding posi-tion of its IP address segments as shown in Fig. 5. The employ-ment of this approximate representation of addresses allows usto reduce the computational and storage demands by a factorof 2 . In general, such domain decomposition reduces the re-quirements, of an -bit packet header data item, from to

or . The data structure also makes it pos-sible to identify the target IP addresses using the reversibility ofthe data structure. By assembling the highest valued position ineach of the four fields, it may be possible to identify the targetor source of attacks. In Fig. 5 of the above example, we can de-duce that 211.40.179.102 as the target IP address of F4 whenthe positions of the highest value in each field are combined.

In order to compute the correlation signal at the end of sam-pling point , we simply multiply normalized values in the sameposition between the two data structures of samples and

, then sum up the multiplied values in each byte segment sep-arately. Consequently, four correlation signals are calculated as

through .In order to generate the address correlation signal at the

end of sampling point , we multiply each segment correlationwith scaling factors and generate as

where (4)

In order to locate the of the ambient signal as a referencelevel, say around 50, we can properly adjust the fitting param-eters A and B. Compared to the of attack-free traffic, weexpect the signal to show a higher correlation if an attack (suchas a DoS attack) or heavy traffic (such as a high-bandwidth filetransfer) concentrates on a specific address. On the other hand,traffic during DDoS and worm attacks is expected to show lowercorrelations due to the dispersion of source and destination ad-dresses, respectively.

By properly choosing the scaling factors, we can obtain ap-propriate aggregation of address space. In this paper, we employ

. On the other hand, we could em-ploy different weights to give preferences to different portions ofthe address segments. For example, by making ,we could consider /16 addresses.

Our approach could introduce errors when the 8-bit addressessegments match even though the 32-bit addresses themselves donot match. For example, if traffic consists of and

in sample and andin sample , even though the actual address correlation is zero,


TABLE ITHE NINE KINDS OF SIMULATED ATTACKS

Fig. 6. The cross-covariance functions show that the random variables of threesignals in Fig. 3 are to be correlated. Especially, correlation coefficients are thezeroth lag of the covariance functions. These covariance functions normalizethe sequence so the auto-covariances at zero lag are identically 1.0.

our method of computing address correlation may result in ahigh correlation between these sampling instants.

To estimate the quantization errors in normal traffic withoutattacks, we compared the full-32-bit address correlation with thecorrelation signal generated through (4). Figs. 3(b) and (c) showthe weighted correlation signal computed with the full-32-bitaddress by (2) and our data structure by (4) with respect toAuckland-IV traces. From the figure, we see that the differ-ences are negligible, i.e., our approach does not add signifi-cant noise. From a statistical point of view, they have an ap-proximately same mean (about 50) and degree of dispersionstandard deviation . Moreover, we examine

the similarity of above signals with cross-correlation coefficientthat is a normalized measure of linear relationship strength be-tween the variables. Suppose is correlation coefficient RV(random variable) defined by (1), is full 32-bit RV by(2), and is weighted correlation RV by (3) and (4). FromFig. 6, their cross-correlation coefficients are ,

, and , respectively. Based onthese results, we will employ the signal of Fig. 3(c) for reducingthe processing complexity in this paper.

C. Attacks

Besides the actual attacks observed in the USC and KRE-ONet2 traces, we construct virtual attacks on the Auckland-IVtraces. This allows us to test the proposed technique under dif-ferent conditions. We consider nine kinds of attacks as shown inTable I. These attacks cover a diversity of behaviors and allow usdeterministically to test the efficacy of proposed mechanisms.

The simulated attacks can be described by a 3-tuple (duration,persistency and IP address). We superimpose these attacks onambient traces. The mixture ratios of attack traffic and normaltraffic range from 1:2 to 1:10 in packet counts. For example,with a traffic mix ratio of 1:10, we inject a single synthesizedpacket for every 10 ambient packets, during the attack period.The detection performance is slightly affected by mixture ratios.Replacement of normal traffic with attack traffic is easier to de-tect due to significant statistical differences during such attacksand hence not considered here.

V. DATA TRANSFORM

A. Data Transform

Wavelet techniques are one of the most up-to-date modelingtools to exploit both non-stationary and long-range dependence[20]–[22] and to analyze the properties of data series [28], [29].In real situations, we encounter signals that are characterizedby abrupt changes and it becomes essential to relate to the oc-currence of an event in time. Since wavelet analysis can revealscaling properties of the temporal and frequency dynamics si-multaneously unlike Fourier Transform, we compute a wavelettransform of the generated address correlation signal over sev-eral sampling points.

1) Wavelet Filter Specification: The goals of the DWT (Dis-crete Wavelet Transform) in our analysis are i) to differentiateanomalous signals from an ambient signal, ii) to analyze theproperties of the anomalous signals such as the duration and itslocation. The ability of wavelet transform to suppress a polyno-mial ambient signal depends on a mathematical characteristic ofthe wavelets called its number of vanishing moments, e.g., fora Daubechies wavelet- . Provided the number of vanishing mo-ments of the wavelet transform exceeds the degree of the poly-nomial signal, the decomposition of the signal suppresses thepolynomial part and highlights the remainder, enabling the ex-traction of the presence of anomalous signals from the originalsignal. In addition, it is known that Daubechies wavelet is com-pact with extremal phase and highest number of vanishing mo-ments for a given support width. Associated scaling filters areminimum-phase filters. In our chosen wavelet mode, we selectthe two-band filter which consists of one low-pass filter Lo_D(or Lo_R) and one high-pass filter Hi_D (or Hi_R). We employa Daubechies-6 two-band filter.

B. Discrete Wavelet Transform

We provide a brief overview of DWT in order to make ourscheme clearer. DWT consists of decomposition (or analysis)and reconstruction (or synthesis). Fig. 7 illustrates a multilevel1-D wavelet analysis using specific wavelet decomposition fil-ters (Lo_D and Hi_D are the low-pass (or scaling) and high-pass(or wavelet) decomposition filters) and the reconstruction of theoriginal signal [18].

For decomposition, starting from a signal , the first stepof the transform decomposes into two sets of coefficients,namely approximation (or scaling) coefficients , and detail(or wavelet) coefficients . The input is convolved with thelow-pass filter Lo_D to yield the approximation coefficients.The detail coefficients are obtained by convolving with thehigh-pass filter Hi_D. This procedure is followed by down-sam-pling by 2. Suppose that the length of each filter is equal to . If

length , each convolved signal is of length andthe coefficients and are of length .


Fig. 7. A multilevel two-band wavelet decomposition and reconstruction procedure.

The second step decomposes the approximation coefficientinto two sets of coefficients using the same method, substituting

by , and producing and , and so forth. At level, the wavelet analysis of the signal has the following coeffi-

cients: . While the ap-proximation coefficient accounts for the overall trends of inputsignal due to low-pass filtering, the detail explains the instan-taneous energy concentration in each timescale. Detail coeffi-cients, , and approximations, , at level are obtainedas follows:

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

(5)

where and are the high-pass and low-pass filters.For reconstruction, starting from two sets of coefficients at

level , that is the approximations and details , the in-verse DWT synthesizes , up-samples by inserting zerosand convolves the up-sampled result with the reconstruction fil-ters Lo_R and Hi_R. Let be the length of cA and cD, andbe the length of the filters Lo_R and Hi_R, then length

. For a discrete signal of length , DWT can con-sist of levels at most.

1) Timescales Selection: The decomposable analysis level isa function of the number of samples, i.e., the DWT window size

, and is bounded by . We iterate analysis up to levelin the postmortem analysis. If we use all coefficients

for reconstruction, we would restore the original weighted cor-relation signal. Due to the decimating operator, at level , wehave coefficients. That is, the filtered signal is down-sam-pled by 2 at each level of the analysis procedure; the signal ofeach level has an effect that sampling interval extends 2 times.Consequently, it means that the wavelet transform identifies thechanges in the signal over several timescales. When we useminutes as sampling interval, the time range at level spansminutes. For instance, when we use 1-minute sampling interval,the cD1 equals to minutes, the cD2 equals

minute interval and so on. These time ranges can indepen-dently sample and restore frequency components ofby the Nyquist sampling theorem.

C. Detection Mechanism

The reconstructed signal is used for detecting anomalies. Wetake the following moving window approach to accommodatefaster detection while reducing the false alarms.

At each sampling instance, we construct the correlation signal. We can consider current sample and previous sam-

ples, and forthe computation of DWT at the sampling point . We callthe analysis (DWT) window, and is the number of samplesin DWT window, i.e., , where is the sampling in-terval, as shown in Fig. 8. And we consider samples,

, and for anomalydetection. We call the detection (DET) window, and is thenumber of samples in DET window. To reduce false alarms dueto instant noise, we require multiple outliers over the detec-tion window to detect anomalies. In general, when , ,or more of the samples in the detection window are above theanomaly threshold, we consider that an anomaly is detected atthe sampling point . When , such a detector wouldemploy majority over the detection window. When is large,we can keep false alarms low. However, larger also increaseslatency of anomaly detection since such a majority function de-lays the attack detection for at least sampling periods. As aresult, attacks smaller than sampling periods are likely to benot detected. We illustrate these observations in Fig. 8 and (6).

(6)

Based on Fig. 8, the attack duration should be greater thana half of the detection widow for detection, when is 1/2. Ifan attack is staged from the sampling instant through ,a detection signal can be generated for a duration of T, at mostfrom time (i.e., after a detection latency of from thebeginning of the attack) to (i.e., after a residualwindow from the end of the attack). According to theselected value, the duration of can vary between(if is and (if is 0).

The empirical results, however, show variable latency de-pending on the attack strength and the threshold levels. It isnoted that the correlation values at a position in the differentmoving windows, e.g., the at position in thepreceding window and the at position inthe current window, are not identical. The latest data in the win-dows affects the overall signal including the preceding pointsin DWT. This can lead to a prompt real-time detection againstsevere attacks.


Fig. 8. The timing diagram in our detection mechanism.

D. Memory Complexity

Our work requires two (i.e., source and destination IP ad-dresses, respectively) samples of packet header datafor computing , where is the size of the sample data (i.e.,the number of flows), is DWT window size. We also main-tain summary information (DWT coefficients, etc.) over a largernumber of samples , for statistical evaluation of the currentdata sample. In our scheme of address domain analysis, isoriginally 2 , reduced to ; is typically 60 (60samples of 2-minute duration in a 2-hour window) in real-timeanalysis. We can compute as follows:

(7)

So, the total space requirement is. Our approach can work with pcap or NetFlow type records

in postmortem, or work with more aggregate data upon packetarrival in real-time.

VI. DETECTION IN POSTMORTEM ANALYSIS

A. Selective Reconstruction in DWT

We simulate two classes of attacks based on persistence,namely the first three attacks are ON/OFF styled attacks and theremaining six attacks are persistent attacks. Our postmortemanalysis allows the administrator to choose the timescales overwhich attacks/anomalies detection is desired. The network op-erator can analyze the traffic successively at different samplingtimes or choose to analyze the traffic at multiple timescales atthe same time. Because of the time-scaled decomposition ofthe wavelets, we are able to detect changes in the behavior ofthe network traffic that may appear at some resolution but gounnoticed at others.

The first three attacks described in have an ON/OFFtiming of 3 minutes. This signal could be effectively detectedby only the first coefficient of the DWT in case of a 1-minutesampling period. If the network administrator concentrates on30-minute duration attack signal, he/she can select a higher-

Fig. 9. The distribution of the ambient traces without attacks. (a) and (c) aredistribution before DWT, (b) and (d) are distribution after DWT.

level coefficient (e.g., ), for detecting designated attacks in-stead of .

The last six attacks expressed in are persistent at-tacks. Attacks last for 1 hour at a minimum. It means that wecould choose the , , and levels among all the co-efficients for reconstruction that are equivalent to 32 minutes,1 hour 4 minutes, and 2 hours 8 minutes, respectively, in thecase of a 1-minute sampling interval.

Our approach also allows the administrator flexibility to an-alyze the anomalies at multiple timescales simultaneously. Forexample, let us assume that we are only interested in detection ofinstantaneous high-bandwidth attacks and more than 30-minuteattacks. In order to detect these attacks, we extract only the first(i.e., minutes in case of 1-minute sampling interval),fifth (i.e., minutes), sixth and seventh levels in decom-position and reconstruct the signal based only on coefficients atthese levels. As a result, the postmortem analysis looks for out-liers in a single reconstructed signal that includes a subset ofwavelet levels (i.e., detail signals).

B. Thresholds Setting Through Statistical Analysis

We develop a theoretical basis for deriving thresholds foranomaly detection. When random variable possessesmean and variance , we can express Chebyshev’s in-equality in number of standard deviations from the mean:

(8)

The Chebyshev inequality can suggest the lower bound ofconfidence level, however, the inequality does not take into ac-count the actual distribution and therefore it is often rather loose.If we assume that the reconstructed signal has a normal distribu-tion, we can design suitable analysis and detection techniques todetect anomalies with high confidence while reducing the falseacceptances. Recent study has shown that Gaussian approxima-tion should work well for aggregated traffic if the level of aggre-gation in the number of traffic and observed time scales is highenough such that individual sources are swallowed due to theCentral Limit Theorem [35]. Our datasets satisfy the necessarycriteria for the minimal level of such an approximation.

In Fig. 9(a) and (c), the original weighted correlation datahave a little positive skewness, where the right tail is more pro-nounced than the left tail. We try to exploit the flatness of detailcoefficients, eliminating the diurnal cycles, instead of –transformation for converting it to more normal distribution. To


Fig. 10. The autocorrelation function of DWT signal free of attacks in thesample paths. Autocorrelation shows second-order stationary condition forWSS, which autocorrelation � �� has an approximately similar distri-bution in each sample path, therefore it is also a function of time difference�� . And from the impulse characteristics of average power of X��, namely� ��, wavelet-transformed ambient signal could be regarded as white noise.

TABLE IISTATISTICAL MEASURES IN THE SAMPLE PATHS

Wavelet transformed signal through selective reconstruction in postmortemmode shows characteristics close to a first-order stationary and ergodiccondition for WSS, which measures have a constant mean. And it has anapproximately similar standard deviation, which equivalents to the squareroot value of the autocorrelation function at zero lag �� in eachsample path due to zero mean.

verify our methodology, we select only some levels of the DWTdecomposition of the ambient trace free of attacks and recon-struct the signal based on those levels. We then look at somestatistical properties. Fig. 9(d) shows the histogram of the re-constructed signal of the ambient Auckland-IV traces in post-mortem mode. The postmortem transformed data without at-tacks have mean 0 and standard deviation 3.38. We verify nor-mality of the Fri/Sun data in Table II through the Lillieforstest for goodness of fit to a normal distribution with unspeci-fied mean and variance [18]. The postmortem transformed datahave a normal distribution at 5% significance level, namely

. The original weighted correlation data are uncer-tain of passing the null hypothesis of normality. However, byreconstructing the correlation time-series with only detail co-efficients excluding the approximation coefficients, the DWTtransforms it into approximated normal distribution. We findthe possibility of Gaussian approximation in both selective andcomprehensive reconstruction using only detail coefficients.

We set two kinds of thresholds, a high threshold indi-cating the traffic is inordinately homogeneously distributed suchas in a DoS attack or during a high bandwidth file transfer, anda low threshold signifying the traffic is heterogeneously dis-tributed abnormally, for example source addresses in a DDoSattack, destination addresses in a worm attack. When we set thethresholds to 10.15 and 10.15, respectively, these figures areequivalent to confidence interval for random process .

(9)

This interval matches 99.7% confidence level by (8). Withsuch thresholds, we can detect attacks with error rate of 0.3%,which can be expected as the target false alarm rate.

In an actual implementation, the detector needs to be trainedfor a short time on attack-free traffic traces prior to its deploy-ment for gathering statistical data and to derive the anomalythresholds. These statistical measures can continue to be up-dated only during the absence of network attacks for preservingthe sensitivity of network attack detector.

1) Statistical Consideration of Threshold: Wide-Sense Sta-tionarity: If statistical parameters of network traffic, such asmean and standard deviation, are stationary distributed undergiven traffic, thresholds of a specific day could be applied toother days. It is generally known that traffic volume exhibitsdiurnal periodicity, which can be extracted by approximationcoefficients in (5). While the work in [36] exploited thesmoothness of the approximation signals of traffic volume forshowing the overall long-term trends, we focus (or reconstruct)on the detail coefficients of address correlation for discerningthe most significant energy. At normal times, the detail coeffi-cients ideally show a flat signal, if the traffic is free of instanta-neous variance. Given that the ambient traffic is free of attack,we try to investigate whether the reconstructed signal with onlydetail coefficients exhibits stationarity.

We gather the 4-week original Auckland traces withoutanomalies and analyze their statistical summary measures.Table II shows the distribution in other days. Suppose that thewavelet reconstructed signal is a random process andthe sequence of each week is a sample path. We can concludethe of these traces is wide-sense stationary (WSS) inpostmortem analysis from the following: (i) is practicallymean-ergodic, i.e., the ensemble average is not dependent ontime; (ii) has an approximately similar standard deviation

over time, as observed in each sample path fromTable II; and (iii) autocorrelation function is a functionof time difference regardless of sample path as observedin Fig. 10. Moreover, could be regarded as white noisefrom the concentration of average power of as shown atzeroth lag in Fig. 10. While the wavelet-reconstructed signalof ambient trace could be considered as WSS Gaussian whitenoise, the attack and anomaly could be considered as a signalof interest. Let us suppose we know only the first week data anduse the limits. The thresholds for postmortem detection are

9.9 at 99.7% confidence level. When the same thresholds areapplied in second week traffic, they equal at 99% level.It illustrates that the thresholds could remain approximatelythe same over several days by virtue of WSS without recom-puting the thresholds frequently. In addition, it is possible torecompute the statistical thresholds using the new traffic whenno anomalies are detected.

2) Consideration of Selectable Parameters: The parame-ters for sampling and detection can be selected based on thedetector’s processing ability and detection mode as shownin Table III. The longer the sampling interval, the longer thedetection latency. The selection of sampling interval is morecritical for real-time mode. Fig. 11 shows the inter-samplemean square error (MSE) of various sampling intervals. Whensampling intervals are small, the variation of traffic informa-tion in successive samples is not negligible. As the sampling


TABLE IIISELECTED PARAMETERS FOR SAMPLING AND DETECTION IN THE POSTMORTEM

AND REAL-TIME MODES

Fig. 11. The relationship between sample-by-sample MSE and sampling inter-vals. We choose 60–120-second samples to minimize MSE and latency.

intervals exceed 10 seconds, the MSE decreases significantly.However, larger sampling intervals lead to larger latencies foranalysis and detection of anomalies. Based on these two obser-vations, we trade off between the stability and the latency inchoosing the sampling interval. In postmortem mode, the sizeof DWT/DET windows depends on the memory availability.

C. Detection of Anomalies Using the Real Attack Traces

Detection results of our composite approach with respect to7-day KREONet2 traces are shown in Fig. 12. Fig. 12(a) illus-trates a weighted correlation signal of IP addresses that is usedfor wavelet transform with real attacks. Fig. 12(b) is the wavelet-transformed and reconstructed signal in postmortem and its de-tection results. The actual attacks assail between the verticallines, and the detection signal is shown with dots at the bottomof the second subfigure.

A 7-day-wide DWT window and a 20-minute-wide DETwindow are used for analysis and detection. To evaluate thereconstructed signal we use as statistical threshold inFig. 12(b). Overall, our results show that our approach mayprovide a good detector of attacks.

The first two attacks attempted to attack a web-server with se-quential source port numbers and targeted for port 80. A singlesource machine sent 48 byte-sized packets to destination IP ad-dresses in /24 address which preserved first 3 bytes of IP andrandomly changed the last 1 byte. Because the first 3 bytes ofthe target addresses are fixed, the reconstructed signal at attacktimes shows higher correlation than the reconstructed signal atnormal times. These attacks continued for about 2 to 4 hours.The last attack was the SQL slammer worm attack that gen-erated random IP addresses at a specific port number. A fewcompromised machines sent a large number of 404-byte-sizedpackets to randomly generated destination IP addresses using1434 UDP port. Due to the diversity of the target addresses, thereconstructed signal at attack times shows lower correlation thanambient signal. This attack persisted for about 3 hours.

Fig. 12. IP address based detection results using KREONet2 real attack tracesin postmortem. The captions on (a) and (b) illustrate mode and selected param-eters. Three anomalously high positive/negative correlations are observed.

Fig. 13. Detection results using USC real attack traces in postmortem.

Most of the attacks in real traces consisted of many flows withrandomized addresses/ports, with one or few packets per flow.This resulted in a large number of flows and low traffic volume.

Figs. 12(c) and (d) show the traffic volume such as byte countsand packet counts. As the figures show, except the first attack,the remaining two attacks did not set off any distinguishablevariance in traffic volume. It shows that the traffic volume signalmay find such scan attacks hard to detect. Fig. 12(e) shows theflow-counting signal can efficiently detect the network scans;however, it may not detect high bandwidth attacks with a smallnumber of target addresses/ports as explained in Section VI-D.

Fig. 13 shows another postmortem result with USC traces.The left subfigure illustrates a correlation signal of IP addressesused for wavelet transform, and the right subfigure is thewavelet-transformed and reconstructed signal in postmortemand its detection results. It is noted that the variation of thecorrelation signal is remarkable at the beginning and ending ofattacks. At beginning and ending instants, the low values denotethe differences in the distribution of normal and attack traffic,i.e., the emergence of new uncorrelated traffic. During theattacks, the high value shows that the attack traffic is correlatedwith itself.

D. Detection of Anomalies Using the Simulated Attack Traces

Detection results on Auckland-IV traces including simulatedattacks are shown in Fig. 14. Fig. 14(a) illustrates a weightedcorrelation signal of IP addresses that is used for wavelet trans-form with attacks. The simulated attacks are staged between thevertical lines, shown in the figure. The first attack among everythree attacks, i.e., , indicates very high correlation be-cause of its single destination IP address; on the other hand, the


Fig. 14. IP address based detection results using simulated attack traces in post-mortem.

TABLE IVDETECTIONABILITY OF THE IP CORRELATION SIGNAL AND THE DWT SIGNAL

a. IP means the original IP address weighted correlation signal withoutapplying the DWT.b. DWT means the DWT transformed signal.c. . means a detection.d. x means a non-detection.e. False positive is counted a series of relevant signal as 1.

remaining attacks, i.e., and , show a very lowvalue because of their randomness. Fig. 14(b) is the wavelet-transformed and reconstructed signal in postmortem and its de-tection results.

We employ 3-day traces of addresses collected over acampus access link for these experiments. The sampling in-terval is 1 minute and the sampling duration is 30 seconds. Thesimulated nine attacks are staged between the vertical linesshown in the figure. The postmortem analysis uses whole 3-daycorrelation data all at once. The reconstructed signals of thefirst three attacks show an oscillatory style becauseof their intermittent attack patterns, while the remaining sixattacks have a shape of a hill and a dale at attack timesdue to the persistency.

The attacks on a single machine, attacks, resultin high correlation signals. Detection signals in the form ofdots show that these attacks can be detected effectively. Onthe other hand, the semi-random attacks and random

attacks result in low correlations due to the diversityof addresses from one sample to the next during these attacks.Consecutive detection signals indicate the length of attacks.Moreover, the detections at sampling points near 1450 and 2900turned out to be regular flash crowds observed at the beginningof business hours due to discontinuity in traffic usage patterns.

While volume signals (byte/packet counts) show relativelyacceptable performance during bandwidth attacks, they are notsensitive to low-rate attacks. Signal based on number of flowsshows considerable weakness against the attacks consisting of alarge volume of traffic with a small number of flows, such as thefirst, fourth, and seventh attacks among our synthesized attacks.

This leads us to observe that the performance of volume/flowsignals depends significantly on the nature of attacks.

E. Effectiveness of DWT

In order to evaluate the effectiveness of employing DWT,we compare the detection results of our scheme employingDWT with the weighted correlation signal without employingDWT. The anomaly detection results are shown in Table IV. Atlow confidence levels (below 90%), DWT does not offer anyadvantage. However, when confidence levels of most interest(90% 99.7%) are considered, DWT provides significantlybetter detection results than the simpler statistical analysis,indicating its superiority in the detection of anomalies.

VII. DETECTION IN REAL-TIME ANALYSIS

A. Individual Reconstruction in DWT

In real-time analysis, the administrator may not have theluxury to analyze the traffic selectively at different timescalessince anomalies need to be detected as they occur. Because ofthese needs of analyzing data at all timescales, and the needto have lower latencies of attack/anomaly detection, real-timeanalysis is much more challenging. In order to complete theanalysis in a short time, real-time analysis can only focus onsmall recent data sets. Because the number of the transformablesamples is closely connected with the size of DWT window,the maximum allowable levels are restricted at , whereis the number of samples. If we want to investigate a specificlevel , it requires samples for reconstruction at least. Inour analysis here, we employed the most recent 2-hours dataof traffic. Detecting anomalies through multiple levels willhave a number of advantages: (i) by setting a high thresholdat each level, anomalies can be detected with high confidence;(ii) depending on operator’s filtering criteria, he/she can adjustthe threshold between accuracy and flexibility as shown inTable V; and (iii) the attributes of attacks, such as the frequencyand pattern, can be determined.

B. Thresholds Setting Through Statistical Analysis

We establish a statistical baseline for ambient traffic as ameans for deriving thresholds for anomaly detection. We clas-sify each level of DWT decomposition of the ambient trace andreconstruct the signal based on each level. The statistics of re-constructed signal at each level are independently calculated.These parameters are updated on an appropriate period with cur-rent and old values. When we set confidence interval ateach level, as shown in Fig. 16, it corresponds to 99.7% confi-dence level and an error rate of 0.3%.

C. Detection of Anomalies Using the Real Attack Trace

The reconstructed signal of each level is used for detectinganomalies. For the real-time detection, a 2-hour DWT windowand a 10-minute DET window are used as listed in Table III.A sampling interval of 2 minutes and a sampling duration of60 seconds are employed. The following approach accommo-dates swifter detection while diminishing the false positives.

First, at each sampling instance, DWT of the samples overthe last 2-hour window (60 samples with a 2-minute samplinginterval) is computed. We carry out a statistical analysis of eachlevel of the DWT signal separately to analyze the signal over


Fig. 15. Address-based detection results using real attack traces in real-time.The signal �� of the topmost subfigure is input into the 2-D real-time detec-tion window. The cD1 through cD6 show the intermediate horizontal detectionresults at each DWT coefficient level. The real-time indicator in the bottommostsubfigure shows the final detection results and latencies using the vertical as wellas horizontal.

Fig. 16. Address based detection results using simulated attack traces in real-time. The real-time indicator in the bottommost detects the originally containedanomalies as well as all kinds of simulated attacks.

all timescales. At each level of the DWT signal, we employ a10-minute detection window. Second, the detection mechanismis employed in two dimensions: horizontal and vertical. Thehorizontal dimension checks for anomaly detection in succes-sive time-samples at the same wavelet signal level. The vertical

checks for anomaly at multiple wavelet signal levels at the sametime. When a specific attack continues in a regular pattern, it hasa strong probability of being captured at a specific level. On theother hand, the vertical dimension allows analysis of attacks, atthe specific sampling instant, in different timescales. As a re-sult, in real-time analysis, each level is examined individuallyfor outliers, simultaneously checking for anomalies both hori-zontally within a level and vertically across multiple levels.

The combination of the horizontal and the vertical evalua-tions is used for attack detection. The number of the outliers iscounted in the 2-D detection window and an alert is triggeredwhen the number of outliers exceeds a threshold. The alertingalgorithm is expressed in pseudo-code as follows.

: current sampling point: horizontal detection window size

: detail coefficients at time in vertical level: threshold at time in vertical level

: threshold in the 2-D window

for tofor to

ifthen count + +;

endend

endif

then announce alertelse announce normal

end

Because we employed a 2-hour DWT window in a 2-minutesampling interval, it can be decomposed up to level 6. The re-sults of our real-time analysis with respect to KREONet2 tracesare shown in Fig. 15. The intermediate detection results at eachlevel are shown in the upper subfigures and the final detectionresult using the 2-D window is shown in the bottommost sub-figure. Our detector achieves acceptable attack detection perfor-mance in both on-line and off-line analysis.

D. Detection of Anomalies Using the Simulated Attack Trace

DWT data consists of seven levels and the thresholds are setfor each level independently. The results of our real-time anal-ysis are shown in Fig. 16. The DWT signal at each timescale isshown along with the horizontal detector (an anomaly detectedover successive samples at the same level). The bottommost pic-ture shows the composite detector that employs the 2-D mecha-nism. The results indicate that the real-time analysis detects allthe attacks along with a few peak usages.

A 1:2 mixture ratio in Table V shows the overall timingrelationship between detection latency and the setting of theconfidence level of our attacks in real-time mode. As we ex-pect, the higher the confidence level, the higher the detectionlatency. When the confidence level is low, more false alarmsare incurred; on the other hand, almost all of the attacks canbe detected without false negatives. Zero latency means thatthe attack can be detected at the first sampling instant whenthe attack is staged. As the threshold is increased, the falseacceptance is diminished; however, false rejection is inducedsometimes. According to the network administrator’s securitystandard, the appropriate confidence level could be established.


TABLE VDETECTION LATENCY OF THE VARIOUS MIXTURE RATIOS IN REAL-TIME MODE

a. Latency is measured by minute unit.b. X means a non-detection.

Our real-time analysis results are promising that attacks maybe detected in a few sampling instances.

E. Attack Volume

We carried out a study of the sensitivity of our address cor-relation-based detectors to the relative volume of attack trafficin Auckland-IV traces. We varied the ratio of attack traffic tonormal traffic volume from 1:2 to 1:5 to 1:10. We evaluated thedetection ability, i.e., false negative, and latency against ninekinds of simulated attacks. The results of this study are shownin Table V. As the attack volume decreases within the samethreshold or the threshold increases in the same attack volume,the false negative rates generally degrade as we expected. How-ever, the attacks are detected even at low traffic volume ratios of1:10 when thresholds of or below are employed.

The results show that the proposed schemes are effectiveeven when the attack traffic volume as low as 10% of thenormal traffic. The latencies for real-time detection get longerwith smaller attack traffic volume as to be expected. The resultsindicate that the DWT analysis of address correlation signal isuseful over a wide range of attack traffic volumes.

VIII. MULTIDIMENSIONAL INDICATORS

A. Analysis of Network Traffic by Port Numbers

It seems feasible to carry out a similar correlation andwavelet-based analysis of network packets based on their portnumbers. This is particularly motivated by the recent large-scaleattacks Code Red and SQL Slammer. Both attacks have beenspawned on particular ports exploiting unique weaknesses ofend applications. It has been observed that in both cases, anunusually large number of packets were generated on theseports during the attack. We simulate the attacks in Table I, andrepeat the same procedure in postmortem mode as earlier, butnow with port numbers as the traffic signal. Suppose that Xis the IP address variable, Y is the port number, and Z is thevariable indicating number of flows.

Our correlation-based analysis shows marked variationsduring an attack when we consider port numbers of packets asdata. Detection results of our approach are shown in Fig. 17.

The topmost subfigure illustrates a weighted correlationsignal of port numbers that are used for wavelet transform.The second subfigure shows the wavelet-transformed signal inpostmortem analysis and its detection results. The transformeddata without attack shows approximately normal distribution,

Fig. 17. Port numbers based detection results in postmortem.

Fig. 18. Cross-correlation function of IP address (X), port numbers (Y), andnumber of flows (Z). The number of flows and either IP address or port numbershave nearly zero correlation coefficient, which means to be uncorrelated.

namely . The 9 and 9 values as thresholdsare equivalent to confidence interval. Theycorrespond to 99.7% confidence level. The results indicate thatcorrelation of port numbers over samples of network trafficcould provide a reliable signal for analyzing and detectingtraffic anomalies. When attacks are staged on a particular port,we find high correlation and when attacks are staged on randomports, we find the correlation to be low.

B. Number of Flows

The number of flows could vary from the norm duringnetwork attacks. Through monitoring changes in the numberof flows, it is feasible to perceive the anomalies. We used thetriple of destination-address/destination-port/protocol as thedefinition of a flow. The traces from the NLANR and KRE-ONet2 contained packets in both directions, so we selectedpackets belonging to only the outgoing direction. Our flownumber-based analysis shows marked variations during anattack when we consider the changes in number of flows asdata, as shown in Fig. 12(e).

The statistical measures of the wavelet-transformed datawithout attack, which are close to normal distribution with alittle heavy-tailed, namely .

Moreover, the flow number is uncorrelated with address andport number signals we have investigated so far. Fig. 18 il-lustrates the cross-covariance function among these variables.The cross-correlation coefficient and , which arecross-covariance function values when lag is zero, are close tozero. Based on normality and uncorrelated property, we can con-sider that Z is independent of X or Y.

C. Comprehensive Traffic Analysis

Up to now, our work has shown that analysis of addresses,port numbers, and flow numbers may individually provide in-dicators of traffic anomalies. Is it possible to combine severalindicators to build a more robust anomaly detector that is lessprone to false alarms? We consider three combinations.


Fig. 19. The multidimensional detection using IP address and port numbers.

First, we design an anomaly detector based on a combi-nation of addresses and port numbers. The 2-D joint (bi-variate) Gaussian density can be determined [34], where

, , , as follows:

��

��

��

��

��

�

��

��

�

��

� ��

�

��

�

�

��

��

��

��

� (10)

Fig. 19 shows the comprehensive detection results in case ofindividual setting as thresholds. The two kinds of dots at thebottom of the figure show detection results. The dots located ontop are marked when both the address and port methods detectanomalies simultaneously. The error rate of such a detector is

(11)

Moreover, the dots located on the bottom are displayed whenonly one of the two detection methods detects anomalies. Theerror rate of such a detector is

(12)

It can be understood that these detectors imply confidencesof 99.9875% and 99.51%, respectively.

Second, we can design another 2-D anomaly detector basedon a combination of addresses and flow numbers. Where

, , , the joint den-sity is defined with marginal densities of and

as follows:

(13)

From the probability density function (13), we can computethe theoretical false alarm rates in 2-D analysis using IP ad-dresses and flow numbers just as (11) and (12). With a set-ting, we can identify attacks with a composite IP address andflow number detector with an error rate of (0.3%) .

Third, we can further design the 3-D anomaly detector basedon a combination of addresses, port numbers, and flow numbers.Using , , ,

, , , the joint (trivariate)

density and error rate can be similarly defined.2We can improve the accuracy of the detector using X, Y, and Zat the same time as the signal.

IX. COMPARISION WITH IDS

Intrusion detection system (IDS) is an important part of net-work security area and is being widely employed. Rule-basedmatching mechanisms require a completed analysis of attackpatterns and the availability of established remedies beforehand.To cope with new attacks, IDS tools require to be continuouslyupdated with the latest rules. Currently there are a few availablefreeware/shareware and commercial IDS tools.

We review Snort as representative IDS [41], and compare theproperties of Snort and our approach. We perform this compar-ison by running the systems on a live, production network. Wereport results from a period which contained a large numberof traffic anomalies. For our experiments, we installed Snort inTexas A&M University network environment, and gathered thedetection results of Snort by capturing 24 hours of data of theAnalysis Console for Intrusion Databases (ACID). We evaluateour approach on the trace of the network traffic analyzed by theSnort system after nearly all rules are turned on.

Snort system reported 13,257 alerts distributed over the ex-periment period in a packet level. We compare it with resultsfrom our approach based on DWT-based signal. In the trace, itis apparent that there are continuous anomalies over almost theentire period. Our approach’s detection results agreed with theresults of Snort. Both Snort and our approach detect suspiciousanomalies throughout the course of the trace capture. The de-tection performance could be considered at a similar level.

However, Snort’s identification mechanism is superior ingranularity. When coupled with a mechanism such as ACID,Snort can more readily identify the source of malicious activity,and what exactly that activity consists of. Snort provides aneasily managed display of IP addresses and port numbers ofany suspicious activity. On the other hand, when our approachperforms the analysis, it reports the pattern of abnormality inan aggregated fashion.

During our evaluation, Snort missed the identification ofmany heavy traffic sources. Some flows, using the BitTorrentsystem run by one of the users of the network, accountedfor about 30% to 60% traffic over certain periods. However,without the operational rule, Snort did not detect these flows.However, our approach identified this flow as an anomalousevent. This demonstrates the utility of the proposed techniquein detecting previously unknown or undocumented anomalousbehaviors.

Regarding the computational complexity, Snort looks at thepayload of the packet as well as the packet header. Moreover,currently over 2400 filter rules are established. Our approachworks on aggregated information from traffic samples. Thememory-intensive approach of Snort would require more com-puting resources to be able to match the performance of ourapproach against heavy traffic.

From these above observations, we feel the two methodscould be combined to provide a more complete detectionsystem capable of detecting a wide array of different networksecurity violations.

2The generalization of multivariate probability density function (PDF) is outof the scope of this paper.


X. CONCLUSION

We studied the feasibility of analyzing packet header datathrough wavelet analysis for detecting traffic anomalies. Specif-ically, we proposed the use of correlation of destination IP ad-dresses, port numbers and the number of flows in the outgoingtraffic at an egress router. Our results show that statistical anal-ysis of aggregate traffic header data may provide an effectivemechanism for the detection of anomalies within a campus oredge network. We studied the effectiveness of our approach inpostmortem and real-time analysis of network traffic. The re-sults of our analysis are encouraging and point to a number ofinteresting directions for future research.

REFERENCES

[1] A. Ramanathan, “WADeS: A tool for distributed denial of service at-tack detection” M.S. thesis, TAMU-ECE-2002-02, Aug. 2002.

[2] NLANR measurement and operations analysis team, NLANR NetworkTraffic Packet Header Traces, Aug. 2002 [Online]. Available: http://www.pma.nlanr.net/Traces/

[3] P. Barford et al., “A signal analysis of network traffic anomalies,” inACM SIGCOMM Internet Measurement Workshop, Nov. 2002.

[4] T. M. Gil and M. Poletto, “MULTOPS: A data-structure for bandwidthattack detection,” in USENIX Security Symp., Aug. 2001.

[5] J. Mirkovic, G. Prier, and P. Reiher, “Attacking DDoS at the source,”in IEEE Int. Conf. Network Protocols, Nov. 2002.

[6] E. Kohler, J. Li, V. Paxson, and S. Shenker, “Observed structure ofaddresses in IP traffic,” in Proc. ACM IMW, Nov. 2002.

[7] A. Garg and A. L. N. Reddy, “Mitigation of DoS attacks through QoSregulation,” in Proc. IWQOS, May 2002.

[8] Smitha, I. Kim, and A. L. N. Reddy, “Identifying long term high rateflows at a router,” in Proc. High Performance Computing, Dec. 2001.

[9] I. Kim, “Analyzing network traces to identify long-term high rateflows,” M.S. thesis, TAMU-ECE-2001-02, May 2001.

[10] Y. Zhang, L. Breslau, V. Paxson, and S. Shenker, “On the characteris-tics and origins of internet flow rates,” in ACM SIGCOMM, Aug. 2002.

[11] R. Mahajan et al., “Controlling high bandwidth aggregates in the net-work,” ACM Comput. Commun. Rev., vol. 32, no. 3, Jul. 2002.

[12] J. Ioannidis and S. M. Bellovin, “Implementing pushback:Router-based defense against DDoS attacks,” in Proc. Networkand Distributed System Security Symp., Feb. 2002.

[13] C. Estan and G. Varghese, “New directions in traffic measurement andaccounting,” in ACM SIGCOMM, Aug. 2002.

[14] A. Medina et al., “Traffic matrix estimation: Existing techniques andnew directions,” in ACM SIGCOMM, Aug. 2002.

[15] K. C. Claffy, H. Braun, and G. Polyzos, “A parameterizable method-ology for Internet traffic flow profiling,” IEEE J. Sel. Areas Commun.,vol. 13, no. 8, pp. 1481–1494, Oct. 1995.

[16] C. S. Burrus, R. A. Gopinath, and H. Guo, Introduction to Wavelets andWavelet Transforms. Englewood Cliffs, NJ: Prentice Hall, 1998.

[17] I. H. Witten, A. Moffat, and T. C. Bell, Managing Gigabytes—Com-pressing and Indexing Documents and Images, 2nd ed. San Mateo,CA: Morgan Kaufmann, 1999, pp. 129–141.

[18] MatLab Software, Release 12.1. The MathWorks, Inc., 2001.[19] “CERT Advisory CA-2003-04 MS-SQL Server Worm,” CERT Coor-

dination Ctr. (CERT/CC), Jan. 2003 [Online]. Available: http://www.cert.org/advisories/CA-2003-04.html

[20] I. Daubechie, “Ten lectures on wavelets,” in CBMS-NSF Regional Con-ference Series in Applied Mathematics, vol. 61. Philadelphia, PA:SIAM, 1992.

[21] S. Mallat, “A theory for multiresolution signal decomposition: Thewavelet representation,” IEEE Trans. Pattern Anal. Machine Intell.,vol. 11, no. 7, pp. 674–693, 1989.

[22] G. W. Wornell, Signal Processing With Fractals: A Wavelet Based Ap-proach. Englewood Cliffs, NJ: Prentice Hall, 1996.

[23] A. Feldmann, A. Gilbert, P. Huang, and W. Willinger, “Dynamics ofIP traffic: A study of the role of variability and the impact of control,”ACM Comput. Commun. Rev., vol. 29, no. 4, pp. 301–313, 1999.

[24] D. Moore et al., “Internet quarantine: Requirements for containing self-propagating code,” in IEEE INFOCOM, Apr. 2003.

[25] Packeteer, “PacketShaper Express,” white paper, 2003, http://www.packeteer.com/resources/prod-sol/ Xpress_Whitepaper.pdf.

[26] S. Floyd, S. Bellovin, J. Ioannidis, K. Kompella, R. Mahajan, andV. Paxson, “Pushback messages for controlling aggregates in thenetwork,” IETF Internet draft, work in progress, Jul. 2001.

[27] S. Savage, D. Whetherall, A. Karlin, and T. Anderson, “Practical net-work support for IP traceback,” in ACM SIGCOMM, 2000.

[28] P. Huang, A. Feldmann, and W. Willinger, “A non-intrusive, wavelet-based approach to detecting network performance problems,” in ACMInternet Measurement Workshop, Nov. 2001.

[29] D. B. Percival and A. T. Walden, Wavelet Methods for Time SeriesAnalysis. Cambridge, U.K.: Cambridge Univ. Press, 2000, ch. 4.

[30] C.-M. Cheng, H. T. Kung, and K.-S. Tan, “Use of spectral analysis indefense against DoS attacks,” in IEEE Globecom, 2002.

[31] KREONet2 (Korea Research Environment Open NETwork2). [On-line]. Available: http://www.kreonet2.net

[32] S. S. Kim, A. L. N. Reddy, and M. Vannucci, “Detecting traffic anom-alies using discrete wavelet transform,” in Proc. Int. Conf. InformationNetworking, 2004, pp. 1375–1384.

[33] S. S. Kim, A. L. N. Reddy, and M. Vannucci, “Detecting traffic anom-alies through aggregate analysis of packet header data,” in Proc. Net-working 2004, May 2004, pp. 1047–1059, LNCS 3042.

[34] E. R. Dougherty, Random Processes for Image and Signal Pro-cessing. New York: SPIE/IEEE Press, 1999, p. 61.

[35] J. Kilpi and I. Norros, “Testing the Gaussian approximation of aggre-gate traffic,” in ACM Internet Measurement Workshop, Nov. 2002.

[36] K. Papagiannaki et al., “Long-term forecasting of internet backbonetraffic: Observations and initial models,” in IEEE INFOCOM, 2003.

[37] A. Kuzmanovic and E. Knightly, “Low-rate TCP-targeted denial of ser-vice attacks,” in ACM SIGCOMM, Karlsruhe, Germany, Aug. 2003.

[38] A. Hussein, J. Heidemann, and C. Papadopoulus, “A framework forclassifying denial of service attacks,” in ACM SIGCOMM, Aug. 2003.

[39] C. Estan, S. Savage, and G. Varghese, “Automatically inferring patternsof resource consumption in network traffic,” in ACM SIGCOMM, 2003.

[40] D. Plonka, “FlowScan: A network traffic flow reporting and visualiza-tion tool,” in USENIX LISA 2000, New Orleans, LA, Dec. 2000.

[41] M. Roesch, “Snort—lightweight intusion detection for networks,” inUSENIX LISA 1999, Seattle, WA, Nov. 1999.

[42] D. Tong and A. L. N. Reddy, “QOS enhancement with partial state,” inProc. IWQOS, Jun. 1999.

[43] A. Lakhina, M. Crovella, and C. Diot, “Diagnosing network-widetraffic anomalies,” in ACM SIGCOMM, Sep. 2004.

Seong Soo Kim received the B.S. and M.S. degrees inelectrical engineering from Yonsei University, Seoul,Korea, in 1989 and 1991, respectively, and the Ph.D.degree in electrical and computer engineering fromTexas A&M University in 2005.

He worked in the areas of analog/digital consumerelectronics and home networking as a research engi-neer at LG Electronics Co., Ltd., Korea, from January1991 to August 2001. After a three-month postdoc-toral course, he is currently working as a PrincipalEngineer at Samsung Electronics Co., Ltd., Korea.

His research interests are in computer network security, wireless networking,multimedia including image and signal processing, and stochastic processing.

Dr. Kim received an Outstanding Research Engineer Award at LG in 1995and a Patent Technology Award from the national patent officer in 1996. He has31 domestic (registered or pending) patents and 15 international patents.

A. L. Narasimha Reddy (M’87–SM’97) receivedthe B.Tech. degree in electronics and electricalengineering from the Indian Institute of Technology,Kharagpur, in 1985, and the M.S. and Ph.D. degreesin computer engineering from the University ofIllinois at Urbana-Champaign (UIUC) in 1987 and1990, respectively. At UIUC, he was supported byan IBM Fellowship.

He is currently a Professor in the Department ofElectrical and Computer Engineering at Texas A&MUniversity. He was a research staff member at IBM

Almaden Research Center from 1990 to 1995. His research interests are in net-work security, network QoS, multimedia, I/O systems and computer architec-ture. Currently, he is leading projects on building scalable network security so-lutions and wide area storage systems.

Prof. Reddy is a member of ACM SIGARCH and a senior member of theIEEE Computer Society. He received an NSF CAREER Award in 1996 and Out-standing Professor Awards at Texas A&M during 1997–1998 and 2003–2004.

Date post:	13-Mar-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times