Determining the Scale of Impact from Denial-of-Service Attacks in Real Time Using Twitter · 2019....

Determining the Scale of Impact from Denial-of-Service Attacksin Real Time Using Twitter

Chi ZhangUniversity of Maryland, Baltimore County

Department of Computer Science and ElectricalEngineering

Baltimore, [email protected]

Bryan WilkinsonUniversity of Maryland, Baltimore County



Ashwinkumar GanesanUniversity of Maryland, Baltimore County



Tim OatesUniversity of Maryland, Baltimore County



ABSTRACTDenial of Service (DoS) attacks are common in on-line and mobileservices such as Twitter, Facebook and banking. As the scale andfrequency of Distributed Denial of Service (DDoS) attacks increase,there is an urgent need for determining the impact of the attack.Two central challenges of the task are to get feedback from a largenumber of users and to get it in a timely manner. In this paper,we present a weakly-supervised model that does not need anno-tated data to measure the impact of DoS issues by applying LatentDirichlet Allocation and symmetric Kullback-Leibler divergence ontweets. There is a limitation to the weakly-supervised module. Itassumes that the event detected in a time window is a DoS attackevent. This will become less of a problem, when more non-attackevents twitter got collected and become less likely to be identi-fied as a new event. Another way to remove that limitation, anoptional classification layer, trained on manually annotated DoSattack tweets, to filter out non-attack tweets can be used to increaseprecision at the expense of recall. Experimental results show thatwe can learn weakly-supervised models that can achieve compa-rable precision to supervised ones and can be generalized acrossentities in the same industry.

CCS CONCEPTS• Computing methodologies→ Information extraction;

KEYWORDSEvent Detection; Topic Modeling; Weakly-supervised Learning

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected] ’18, December 3–4, 2018, San Juan, PR, USA© 2018 Association for Computing Machinery.ACM ISBN 978-1-4503-6218-4/18/12. . . $15.00https://doi.org/10.1145/3306195.3306199

1 INTRODUCTIONDenial of Service attacks are explicit attempts to stop legitimateusers from accessing specific network systems [10]. Attackers tryto exhaust network resources like bandwidth, or server resourceslike CPU and memory. As a result, the targeted system slows downor becomes unusable [21]. On-line service providers like Bank OfAmerica, Facebook and Reddit are often the target of such attacksand the frequency and scale of those attacks has increased rapidlyin recent years [32].

To address this problem, there is ample previous work on meth-ods to detect and handle Denial of Service attacks, especially Dis-tributed Denial of Service attacks. D-WARD [20] is a scheme thattries to locate a DDoS attacks at the source by monitoring inboundand outbound traffic of a network and comparing it with prede-fined "normal" values. Some IP Traceback mechanisms [15] weredeveloped to trace back to the attack source from the victim’s end.Still other methods try to deploy a defensive scheme in an entirenetwork to detect and respond to an attack at intermediate sub-networks. Watchers [5] is an example of this approach.

Despite all the new models and techniques to prevent or handlecyber attacks, DDoS attacks keep evolving. Services are still beingattacked frequently and brought down from time to time. After aservice is disrupted, it is crucial for the provider to assess the scaleof the outage impact.

In this paper, we present a novel approach to solve this problem.No matter how complex the network becomes or what methods theattackers use, a denial of service attack always results in legitimateusers being unable to access the network system or slowing downtheir access and they are usually willing to reveal this informationon social media plaforms. Thus legitimate user feedback can bea reliable indicator about the severity level of the service outage.Thus we split this problem into two parts namely by first isolatingthe tweet stream that is likely related to a DoS attack and thenmeasuring the impact of attack by analyzing the extracted tweets.

A central challenge to measure the impact is how to figure outthe scale of the effect on users as soon as possible so that appro-priate action can be taken. Another difficulty is given the huge

1

arX

iv:1

909.

0589

0v1

[cs

.SI]

12

Sep

2019

https://doi.org/10.1145/3306195.3306199

number of users of a service, how to effectively get and processthe user feedback. With the development of Social Networks, es-pecially micro blogs like Twitter, users post many life events inreal time which can help with generating a fast response. Anotheradvantage of social networks is that they are widely used. Twitterclaims that they had 313 million monthly active users in the secondquarter of 2016 [30]. This characteristic will enlarge the scope ofdetection and is extremely helpful when dealing with cross domainattacks because tweets from multiple places can be leveraged. Thelarge number of users of social networks will also guarantee thesensitivity of the model. However, because of the large number ofusers, a huge quantity of tweets will be generated in a short time,making it difficult to manually annotate the tweets, which makesunsupervised or weakly-supervised models much more desirable.

In the Twitter data that we collected there are three kinds oftweets. Firstly are tweets that are actually about a cyberattack. Forexample, someone tweeted "Can’t sign into my account for bank ofAmerica after hackers infiltrated some accounts." on September 19,2012 when a attack on the website happened. Secondly are tweetsabout some random complaints about an entity like "Death to Bankof America!!!! RIP my Hello Kitty card... " which also appeared onthat day. Lastly are tweets about other things related to the bank.For example, another tweet on the same day is "Should iget anaccount with bank of america or welsfargo?".

To find out the scale of impact from an attack, we must first pickout the tweets that are about the attack. Then using the ratio andnumber of attack tweets, an estimation of severity can be generated.To solve the problem of detecting Denial of Service attacks fromtweets, we constructed a weakly-supervised Natural Language Pro-cessing (NLP) based model to process the feeds. More generally, thisis a new event detection model. We hypothesize that new topicsare attack topics. The hypothesis would not always hold and thisissue will be handled by a later module. The first step of the modelis to detect topics in one time window of the tweets using LatentDirichlet Allocation [4]. Then, in order to get a score for each of thetopics, the topics in the current time window are compared withthe topics in the previous time window using Symmetric Kullback-Leibler Divergence (KL Divergence) [16]. After that, a score foreach tweet in the time window is computed using the distributionof topics for the tweet and the score of the topics. We’re looking fortweets on new topics through time. While the experiments showpromising results, precision can be further increased by addinga layer of a supervised classifier trained with attack data at theexpense of recall.

Following are the contributions in this paper:

(1) A dataset of annotated tweets extracted from Twitter dur-ing DoS attacks on a variety organizations from differingdomains such as banking (like Bank Of America) and tech-nology.

(2) A weakly-supervised approach to identifying detect likelyDoS service related events on twitter in real-time.

(3) A score to measure impact of the DoS attack based on thefrequency of user complaints about the event.

The rest of this paper is organized as follows: In section 2, previ-ous work regarding DDoS attack detection and new event detectionwill be discussed. In section 3, we describe the how the data was

collected. We also present the model we created to estimate theimpact of DDoS attacks from Twitter feeds. In section 4, the experi-ments are described and the results are provided. In section 5 wediscuss some additional questions. Finally, section 6 concludes ourpaper and describes future work.

2 RELATEDWORKDenial of Service (DoS) attacks are a major threat to Internet se-curity, and detecting them has been a core task of the securitycommunity for more than a decade. There exists significant amountof prior work in this domain. [6, 13, 17] all introduced differentmethods to tackle this problem. The major difference between thiswork and previous ones are that instead of working on the data ofthe network itself, we use the reactions of users on social networksto identify an intrusion.

Due to the widespread use of social networks, they have becomean important platform for real-world event detection in recentyears [14]. Dou et al. [12] defined the task of new event detection as"identifying the first story on topics of interest through constantlymonitoring news streams". Atefeh et al. [1] provided a comprehen-sive overview of event detection methods that have been applied totwitter data. We will discuss some of the approaches that are closelyrelated to our work. Weng et al. [31] used a wavelet-signal clus-tering method to build a signal for individual words in the tweetsthat was dependent high frequency words that repeated themselves.The signals were clustered to detect events. Sankaranarayanan etal. [28] presented an unsupervised news detection method basedon naive Bayes classifiers and on-line clustering. Long et al. [18]described an unsupervised method to detect general new eventdetection using Hierarchical divisive clustering. Phuvipadawat etal. [26] discussed a pipeline to collect, cluster, rank tweets and ulti-mately track events. They computed the similarity between tweetsusing TF-IDF. The Stanford Named Entity Recognizer was used toidentify nouns in the tweets providing additional features whilecomputing the TF-IDF score. Petrović et al. [25] tried to detectevents on a large web corpus by applying a modified locality sensi-tive hashing technique and clustering documents (tweets) together.Benson et al. [3] created a graphical model that learned a latentrepresentation for twitter messages, ultimately generating a canon-ical value for each event. Tweet-scan [9] was a method to detectevents in a specific geo-location. After extracting features such asname, time and location from the tweet, the method used DB-SCANto cluster the tweets and Hierarchical Dirichlet Process to modelthe topics in the tweets. Badjatiya et. al. [2] applied deep neuralnetworks to detect events. They showed different architecturessuch as Convolutional Neural Networks (CNN), Recurrent NeuralNetworks (LSTM based) and FastText outperform standard n-gramand TF-IDF models. Burel et al. [8] created a Dual-CNN that hadan additional channel to model the named entities in tweets apartfrom the pretrained word vectors from GloVe [23] or Word2Vec[19].

Thus most event detection models can be grouped into threemain categories of methods i.e. TF-IDF based methods, approachesthat model topics in tweets and deep neural network based al-gorithms. One of the main challenges against applying a neuralnetwork model is the the requirement of a large annotated corpus

of tweets. Our corpus of tweets is comparatively small. Hence webuild our pipeline by modeling the topics learned from tweets.

The previous work that is most similar to ours was Cordeiro[11]. We both used Latent Dirichlet Allocation (LDA) to get thetopics of the document, the difference was they only run LDA onthe hash-tag of the tweets while we try to get the topics in thetweets by running it on the whole document.

Latent Dirichlet Allocation [4] was a method to get topics froma corpus. In our work, we used the technique to acquire the valuesof some of the variables in our equation. A variation of it, Hierar-chically Supervised Latent Dirichlet Allocation [24] was used inthe evaluation.

3 APPROACHFigure 1 outlines the entire pipeline of themodel from preprocessingtweets to modeling them and finally detecting / ranking futuretweets that are related to a DoS issue and measuring its severity.

3.1 Data CollectionTo collect the tweets, we first gathered a list of big DDoS attackshappened from 2012 to 2014. Then for each attack on the list, wecollected all the tweets from one week before the attack to theattack day that contains the name of the entity attacked.

3.2 PreprocessingThe following preprocessing procedure were applied to the corpusof tweets:

• Remove all the meta-data like time stamp, author, and so on.These meta-data could provide useful information, but onlythe content of the tweet was used for now.

• Lowercase all the text• Use an English stop word list to filter out stop words.

The last two steps are commonly used technique when preprocess-ing text.

3.3 Create LDA ModelsNow we try to find out a quantitative representation of the corpus.To do that, the preprocessed tweets about one attack will be dividedinto two groups. One is on the attack day and the other is the tweetsone week before it. The first set will be called Da and the other oneDb . This step will create two separate LDA models for Da and Dbusing the Genism library [27]. The first Model will be called Maand the other oneMb .

Latent Dirichlet allocation (LDA) is a generative probabilistictopic modeling model. Figure 2 is its plate notation. The meaning ofdifferent parametersM , N , α , β , θ , z andw is also described there.

We used the LDA algorithm implemented by the Gensim library.One of the most important parameters of the LDA algorithm is thenumber of topics Nt in the corpus. To determine that we introducedthe following formula:

Nt = ⌊α ∗ loдNd ⌋ (1)

where Nd is the number of tweets in the corpus. α is a constant andwe used α=10 in our experiments. The logic behind the equation isdiscussed in section 5.

3.4 The attack topicsThen we want to find out how the new topics are different fromthe history topics or, in other words, how topics inMa differ fromtopics inMb . We define the Symmetric Kullback-Leibler divergencefor topic Tj in ModelMa as:

SKLj = min1<m<n

(Dkl (Tj ,T′m ) + Dkl (T

′m ,Tj )) (2)

Where n is the number of topics in ModelMb , T′m is themth topic

in ModelMb and Dk l(X ,Y ) is the original Kullback-Leibler Diver-gence for discrete probability distributions which defined as :

Dkl (X ,Y ) =∑iX (i) ∗ log X (i)

Y (i) (3)

Where X (i) and Y (i) are the probability of token i in topics X andY respectively. This is similar to the Jensen-Shannon divergence.

So for each topic Tj in ModelMa its difference to topics inMbis determined by its most similar topic inMb .

The topics from the attack day model Ma are ranked by theirSymmetric Kullback-Leibler divergence to topics from the non-attack day modelMb . An example of selected attack topics is pro-vided in section 4.3.

3.5 The attack tweetsThis subsection is about how to find specific tweets that are about anetwork attack. The tweets are selected based on the relative scoreS . The score for tweet ti is defined as:

S =n∑j=1

Pi, j ∗ SKLj (4)

Where n is the number of topics on the attack day, Pi, j is the prob-ability that topic j appears in tweet ti in the attack day LDA model,and SKLj is the Symmetric Kullback-Leibler divergence for topic j.The higher the score the more likely it is related to an attack event.

3.6 Optional Classifier LayerBecause annotated data is not needed, the model we describedbefore can be regarded as a weakly-supervised model to detect newevents on twitter in a given time period. To label tweets as attacktweets, one assumption must be true, which is that the new eventin that time period is a cyber attack. Unfortunately, that is usuallynot true. Thus, an optional classifier layer can be used to preventfalse positives.

By using a decision tree model we want to find out whetherthe weakly-supervised part of the model can simplify the problemenough that a simple classification algorithm like a decision tree canhave a good result. Additionally, it is easy to find out the reasoningunderline a decision tree model so that we will know what the mostimportant features are.

The decision tree classifier is trained on the bag of words ofcollected tweets and the labels are manually annotated. We limitthe minimum samples in each leaf to be no less than 4 so that thetree won’t overfit. Other than that, a standard Classification andRegression Tree (CART) [7] implemented by scikit-learn [22] wasused. The classifier was only trained on the training set (tweets

Figure 1: Workflow to process tweets gathered and build a model to rank future tweets that likely to be related to a DoS attack.The ranked tweets are used to measure the severity of the attack.

Figure 2: Plate notation of LDA [4]. The outer box denotesdocuments in the corpus andM is the number of documents.The inner box denotes the repeated choice of topics andwordswithin a documentwhereN is the number ofwords ina document. α is the parameter of the Dirichlet prior on theper-document topic distributions. β is the parameter of theDirichlet prior on the per-topic word distribution. θ is thetopic distribution. z is the topic of wordw in the document.

about Bank of America on 09/19/2012), so that the test results donot overestimate accuracy.

3.7 Measure the SeverityThe definition of severity varies from different network servicesand should be studied case by case.

For the sake of completeness, we propose this general formula:

SeverityLevel = β ∗ NattackNall

+ (1 − β) ∗ NattackNuser

(5)

In the equation above, β is a parameter from 0 to 1 which deter-mines the weight of the two parts. Nattack is the number of attacktweets found. Nall means the number of all tweets collected in the

time period. And Nuser is the number of twitter followers of thenetwork service.

An interesting future work is to find out the quantitative relationbetween SeverityLevel score and the size of the actual DDoS attack.

4 EXPERIMENTSIn this section we experimentally study the proposed attack tweetdetection models and report the evaluation results.

4.1 Term DefinitionWe used precision and recall for evaluation:

• Precision: Out of all of the tweets that are marked as attacktweets, the percentage of tweets that are actually attacktweets. Or true positive over true positive plus false positive.

• Recall: Out of all of the actual attack tweets, the percentageof tweets that are labeled as attack tweets. Or true positiveover true positive plus false negative.

4.2 Experiment DatasetWe collected tweets related to five different DDoS attacks on threedifferent American banks. For each attack, all the tweets containingthe bank’s name posted from one week before the attack until theattack day were collected. There are in total 35214 tweets in thedataset. Then the collected tweets were preprocessed as mentionedin the preprocessing section.

The following attacks were used in the dataset:

• Bank of America attack on 09/19/2012.• Wells Fargo Bank attack on 09/19/2012.• Wells Fargo Bank attack on 09/25/2012.• PNC Bank attack on 09/19/2012.• PNC Bank attack on 09/26/2012.

4.3 The Attack TopicsOnly the tweets from the Bank of America attack on 09/19/2012were used in this experiment. The tweets before the attack day andon the attack day were used to train the two LDAmodels mentionedin the approach section.

The top, bottom 4 attack topics and their top 10 words are shownin table 1 and 2.

Topic Top 10 Words SKLbank, america’s, prolonged,

1 site, slowdown, stuck, website, 9.729tech, slow, entirelyhackers, bank, nyse,

2 america, angered, target, sacrilegious, 9.205website, movie„ .america’s, bank, katherine,

3 fannie, mae, mangu-ward, examines, 9.099contract, reason’s, accountbank, site, america’s, outage,

4 prolonged, sept, users, 9.05518, said, reported

Table 1: Top 4 Attack topics from the Bank of America datawith their Symmetric Kullback-Leibler divergence

Topic Top 10 Words SKLbank, america, follows,

1 light, central, sales, policy, 4.15656803208709check, rt, cashedamerica, bank, bad,

2 great, claiming, keep, work, 4.16785261141118post, can, feedbackbank, america, capital,

3 ..., @abc, deon, pitsor, 4.30044067526549names, >, annualcan, work, america, bank,

4 help, happened, anything, 4.33914718404024jh, ma, s17

Table 2: Bottom 4 Attack topics from the Bank of Americadata with their Symmetric Kullback-Leibler divergence

As shown in table 1, there are roughly 4 kinds of words in theattack topics. First is the name of the entity we are watching. Inthis case, it is Bank of America. Those words are in every tweet, sothey get very high weight in the topics, while not providing usefulinformation. Those words can be safely discarded or added to thestop word list. The second type of words are general cybersecuritywords like website, outage, hackers, slowdown and so on. Thosewords have the potential to become an indicator. When topics withthose words appears, it is likely that there exists an attack. Thethird kind are words related to the specific attack but not attacksin general. Those words can provide details about the attack, butit is hard to identify them without reading the full tweets. In ourexample, the words movie and sacrilegious are in this group. That

is because the DDoS attack on Bank of America was in responseto the release of a controversial sacrilegious film. The remainingwords are non-related words. The higher the weights of them in atopic, the less likely the topic is actually about a DDoS attack.

The results showed that except the 3rd topic, the top 4 topicshave high weight on related words and the number of the forthtype of words are smaller than the first three types of words. Thereare no high weight words related to security in the bottom 4 topics.We can say that the high SKL topics are about cyber attacks.

4.4 The Attack TweetsIn this subsection we discuss the experiment on the attack tweetsfound in the whole dataset. As stated in section 3.3, the wholedataset was divided into two parts. Da contained all of the tweetscollected on the attack day of the five attacks mentioned in section4.2. And Db contained all of the tweets collected before the fiveattacks. There are 1180 tweets in Da and 7979 tweets in Db . Thetweets on the attack days (Da ) are manually annotated and only 50percent of those tweets are actually about a DDoS attack.

The 5 tweets that have the highest relative score in the datasetare:

• jiwa mines and miner u.s. bancorp, pnc latest bank web-sites to face access issues: (reuters) - some u.s. bancorp...http://bit.ly/p5xpmz

• u.s. bancorp, pnc latest bank websites to face access issues:(reuters) - some u.s. bancorp and pnc financial...

• @pncvwallet nothing pnc sucks fat d ur lucky there’s 3 pnc’saround me or your bitchassness wouldnt have my money

• business us bancorp, pnc latest bank websites to face accessissues - reuters news

• forex business u.s. bancorp, pnc latest bank websites to faceaccess issues http://dlvr.it/2d9ths

The precision when labeling the first x ranked tweets as attacktweet is shown in the figure 3. The x-axis is the number of rankedtweets treated as attack tweets. And the y-axis is the correspondingprecision. The straight line in figures 3, 6 and 11 is the result of asupervised LDA algorithm which is used as a baseline. SupervisedLDA achieved 96.44 percent precision with 10 fold cross validation.

The result shows that if the model is set to be more cautiousabout labeling a tweet as an attack tweet, a small x value, higherprecision, even comparable to supervised model can be achieved.However as the x value increases the precision drops eventually.

Figure 4 shows the recall of the same setting. We can find outthat the recall increases as the model becomes more bold, at theexpense of precision.

Figure 5 is the detection error trade-off graph to show the relationbetween precision and recall more clearly (missed detection rate isthe precision).

4.5 GeneralizationIn this subsection we evaluate how good the model generalizes. Toachieve that, the dataset is divided into two groups, one is about theattacks on Bank of America and the other group is about PNC andWells Fargo. The only difference between this experiment and theexperiment in section 4.4 is the dataset. In this experiment settingDa contains only the tweets collected on the days of attack on PNC

Figure 3: Precision, positive predictive value, of the modelwhen labeling the first x ranked tweets as attack tweet usingall of the tweets collected. The straight line is the result of asupervised LDA model as a baseline.

Figure 4: Recall, true positive rate, of the model when label-ing the first x ranked tweets as attack tweet using all of thetweets collected.

and Wells Fargo. Db only contains the tweets collected before theBank of America attack. There are 590 tweets inDa and 5229 tweetsin Db . In this experiment, we want to find out whether a modeltrained on Bank of America data can make good classification onPNC and Wells Fargo data.

Figures 6 and 7 will show the precision and recall of the model inthis experiment setting. A detection error trade-off graph (Figure 8)is also provided. The result is similar to the whole dataset settingfrom the previous section. The smaller the x value is, the higherthe precision and lower the recall, vice versa. The precision is also

Figure 5: Detection error trade-off graph when labeling thedifferent number of ranked tweets as attack tweet using allof the tweets collected.

Figure 6: Precision, positive predictive value, of the modelwhen labeling the first x ranked tweets as attack tweet. Themodel was trained on Bank of America data and tested onPNC and Wells Fargo data. The straight line is the result ofa supervised LDA model as a baseline.

comparable to the supervised model when a small x is chosen. Thisshows that the model generalized well.

4.6 Impact EstimationUsing the result from last section, we choose to label the first 40tweets as attack tweets. The number 40 can be decided by eitherthe number of tweets labeled as attack tweets by the decision treeclassifier or the number of tweets that have a relative score S higher

Figure 7: Recall, true positive rate, of the model when la-beling the first x ranked tweets as attack tweet. The modelwas trained on Bank of America data and tested on PNC andWells Fargo data.

Figure 8: Detection error trade-off graph when labeling thedifferent number of ranked tweets as attack tweet. Themodel was trained on Bank of America data and tested onPNC and Wells Fargo data.

than a threshold. The PNC and Wells Fargo bank have 308.3k fol-lowers combined as of July 2018. According to eqution (5) fromsection 3.6, the severity Level can be computed.

SeverityLevel = β ∗ 40590+ (1 − β) ∗ 40

308300(6)

The score would have a range from 6.78 * 10−2 to 1.30 * 10−3,depending on the value of β . This means that it could be a fairlyimportant event because more than six percent of tweets mention-ing the banks are talking about the DDoS attack. However it could

also be a minor attack because only a tiny portion of the peoplefollowing those banks are complaining about the outage. The valueof β should depend on the provider’s own definition of severity.

4.7 Parameter TuningThis model has two parameters that need to be provided. One is αwhich is needed to determine the number of topics parameter Nt ,and the other is whether to use the optional decision tree filter.

Figures 9 and 10 provide experimental results on the model withdifferent combinations of parameters. We selected four combina-tions that have the best and worst performance. All of the resultscan be found in appendix. The model was trained on Bank of Amer-ica tweets and tested on PNC and Wells Fargo tweets like in section4.5. In the figure, different lines have different values of α whichranges from 5 to 14 and the x axis is the number of ranked tweetslabeled as attack tweets which have a range of 1 to 100 and they-axis is the precision or recall of the algorithm and should be anumber from 0 to 1.

The results shows the decision tree layer increases precision atthe cost of recall. The model’s performance differs greatly withdifferent α values while there lacks a good way to find the optimalone.

5 DISCUSSIONIn this section, we will discuss two questions.

Firstly, we want to briefly discuss how good humans do on thistask. What we find out is though humans perform well on mostof the tweets, some tweets have proven to be challenging withoutadditional information. In this experiment, we asked 18 membersof our lab to classify 34 tweets picked from human annotated ones.There are only two tweets which all the 18 answers agree with eachother. And there are two tweets that got exactly the same numberof votes on both sides. The two tweets are "if these shoes get soldout before i can purchase them, i’ma be so mad that i might justswitch banks! @bankofamerica fix yourself!" and "nothing’s forsure, but if i were a pnc accountholder, i’d get my online bankingbusiness done today: http://lat.ms/uv3qlo".

The second question we want to talk about is how to find out theoptimal number of topics in each of the two LDA models. As shownin the parameter tuning section, the number of topics parametergreatly affects the performance of the model. We’ve tried severalways to figure out the number of topics. First a set number oftopics for different corpora. We tried 30 different topic numberson the Bank of America dataset and chose the best one, and thentested it on the PNC data. The result shows that this method doesnot perform well on different datasets. We think it is because thenumber of topics should be a function of the number of documentsor number of words in the corpus. Then we tried to let the modelitself determines the parameter. There are some LDA variations thatcan do automatic number of topic inference. The one we chose isthe Hierarchical Dirichlet Process (HDP) mixture model, which is anonparametric Bayesian approach to clustering grouped data and anatural nonparametric generalization of Latent Dirichlet Allocation[29]. However it does not perform very well. Its precision is shownin figure 11 and recall is shown in figure 12. We think the reason forthis kind of performancemight be that tweets, with the restriction of

Figure 9: Selected precision, positive predictive value, of the models with different parameter combinations. α is a parameterused to find out number of topics in the corpus. The model was trained on Bank of America data and tested on PNC andWellsFargo data.

Figure 10: Selected recall, true positive rate, of the models with different parameter combinations. α is a parameter used tofind out number of topics in the corpus. The model was trained on Bank of America data and tested on PNC and Wells Fargodata.

Figure 11: Precision, positive predictive value, of the Hier-archical Dirichlet process model when labeling the first xranked tweets as attack tweet using all of the tweets col-lected. The straight line is the result of a supervised LDAmodel as a baseline.

Figure 12: Recall, true positive rate, of the HierarchicalDirichlet process model when labeling the first x rankedtweets as attack tweet using all of the tweets collected.

140 characters, have very different properties than usual documentslike news or articles. The last method is what was proposed in thispaper. An α equals 10 is what we chose and did a good job on theexperiments. But it is only an empirical result.

6 CONCLUSIONIn this paper, we proposed a novel weakly-supervised model withoptional supervised classifier layer to determine the impact of aDenial-of-Service attack in real time using twitter. The approach

computes an anomaly score based on the distribution of new topicsand their KL divergence to the historical topics. Then we tested themodel on same and different entities to check the model’s perfor-mance and how well it generalize. Our experiment result showedthat the model achieved decent result on finding out tweets relatedto a DDoS attack even comparable to a supervised model baseline.And it could generalize to different entities within the same domain.Using the attack tweets, we could get an estimation of the impactof the attack with a proposed formula.

There remain some interesting open questions for future re-search. For example, it is important to figure out a way to findout the optimal number of topics in the dataset. We would alsobe interested to see how well this model will perform on otherkind of event detection task if the optional classifier layer changesaccordingly.

A ADDITIONAL RESULT FOR PARAMETERTUNING

Figures 13 and 14 provide all of the experimental results on themodel with different combinations of parameters.

REFERENCES[1] Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event

detection in twitter. Computational Intelligence 31, 1 (2015), 132–164.[2] Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017.

Deep learning for hate speech detection in tweets. In Proceedings of the 26thInternational Conference on World Wide Web Companion. International WorldWide Web Conferences Steering Committee, 759–760.

[3] Edward Benson, Aria Haghighi, and Regina Barzilay. 2011. Event discovery insocial media feeds. In Proceedings of the 49th Annual Meeting of the Association forComputational Linguistics: Human Language Technologies-Volume 1. Associationfor Computational Linguistics, 389–398.

[4] DavidMBlei, Andrew YNg, andMichael I Jordan. 2003. Latent dirichlet allocation.Journal of machine Learning research 3, Jan (2003), 993–1022.

[5] Kirk A Bradley, Steven Cheung, Nicholas Puketza, Biswanath Mukherjee, andRonald A Olsson. 1998. Detecting disruptive routers: A distributed networkmonitoring approach. IEEE network 12, 5 (1998), 50–60.

[6] Rodrigo Braga, Edjard Mota, and Alexandre Passito. 2010. Lightweight DDoSflooding attack detection using NOX/OpenFlow. In Local Computer Networks(LCN), 2010 IEEE 35th Conference on. IEEE, 408–415.

[7] Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone. 1984.Classification and regression trees. 1993. Chapman Hall, New York (1984).

[8] Grégoire Burel, Hassan Saif, Miriam Fernandez, and Harith Alani. 2017. Onsemantics and deep learning for event detection in crisis situations. (2017).

[9] Joan Capdevila, Jesús Cerquides, Jordi Nin, and Jordi Torres. 2017. Tweet-scan:An event discovery technique for geo-located tweets. Pattern Recognition Letters93 (2017), 58–68.

[10] Glenn Carl, George Kesidis, Richard R Brooks, and Suresh Rai. 2006. Denial-of-service attack-detection techniques. IEEE Internet computing 10, 1 (2006),82–89.

[11] Mário Cordeiro. 2012. Twitter event detection: combining wavelet analysis andtopic inference summarization. In Doctoral symposium on informatics engineering.11–16.

[12] Wenwen Dou, K Wang, William Ribarsky, and Michelle Zhou. 2012. Eventdetection in social media data. In IEEE VisWeek Workshop on Interactive VisualText Analytics-Task Driven Analytics of Social Media Content. 971–980.

[13] Laura Feinstein, Dan Schnackenberg, Ravindra Balupari, and Darrell Kindred.2003. Statistical approaches to DDoS attack detection and response. In DARPAInformation Survivability Conference and Exposition, 2003. Proceedings, Vol. 1.IEEE, 303–314.

[14] Anuradha Goswami and Ajey Kumar. 2016. A survey of event detection tech-niques in online social networks. Social Network Analysis and Mining 6, 1 (17Nov 2016), 107. https://doi.org/10.1007/s13278-016-0414-1

[15] A John and T Sivakumar. 2009. Ddos: Survey of traceback methods. InternationalJournal of Recent Trends in Engineering ACEEE (Association of Computer Electronics& Electrical Engineers) 1, 2 (2009).

[16] Solomon Kullback. 1997. Information theory and statistics. Courier Corporation.[17] Keunsoo Lee, Juhyun Kim, Ki Hoon Kwon, Younggoo Han, and Sehun Kim.

2008. DDoS attack detection method using cluster analysis. Expert Systems with

https://doi.org/10.1007/s13278-016-0414-1

Figure 13: Precision, positive predictive value, of the models with different parameter combinations. α is a parameter used tofind out number of topics in the corpus. The model was trained on Bank of America data and tested on PNC and Wells Fargodata.

Figure 14: Recall, true positive rate, of the models with different parameter combinations. α is a parameter used to find outnumber of topics in the corpus. The model was trained on Bank of America data and tested on PNC and Wells Fargo data.

Applications 34, 3 (2008), 1659–1665.[18] Rui Long, Haofen Wang, Yuqiang Chen, Ou Jin, and Yong Yu. 2011. Towards

Effective Event Detection, Tracking and Summarization on Microblog Data.. InWAIM. Springer, 652–663.

[19] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficientestimation of word representations in vector space. arXiv preprint arXiv:1301.3781(2013).

[20] Jelena Mirkovic, Gregory Prier, and Peter Reiher. 2003. Source-end DDoS defense.InNetwork Computing and Applications, 2003. NCA 2003. Second IEEE InternationalSymposium on. IEEE, 171–178.

[21] Jelena Mirkovic and Peter Reiher. 2004. A taxonomy of DDoS attack and DDoSdefense mechanisms. ACM SIGCOMM Computer Communication Review 34, 2(2004), 39–53.

[22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: MachineLearning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

[23] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove:Global vectors for word representation. In Proceedings of the 2014 conference onempirical methods in natural language processing (EMNLP). 1532–1543.

[24] Adler J Perotte, FrankWood, Noemie Elhadad, and Nicholas Bartlett. 2011. Hierar-chically supervised latent Dirichlet allocation. In Advances in Neural InformationProcessing Systems. 2609–2617.

[25] Saša Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming first storydetection with application to twitter. In Human language technologies: The 2010annual conference of the north american chapter of the association for computationallinguistics. Association for Computational Linguistics, 181–189.

[26] Swit Phuvipadawat and Tsuyoshi Murata. 2010. Breaking news detection andtracking in Twitter. In 2010 IEEE/WIC/ACM International Conference on WebIntelligence and Intelligent Agent Technology. IEEE, 120–123.

[27] Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modellingwith Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challengesfor NLP Frameworks. ELRA, Valletta, Malta, 45–50.

[28] Jagan Sankaranarayanan, Hanan Samet, Benjamin E Teitler, Michael D Lieberman,and Jon Sperling. 2009. Twitterstand: news in tweets. In Proceedings of the 17thacm sigspatial international conference on advances in geographic informationsystems. ACM, 42–51.

[29] Yee W Teh, Michael I Jordan, Matthew J Beal, and David M Blei. 2005. Sharingclusters among related groups: Hierarchical Dirichlet processes. In Advances inneural information processing systems. 1385–1392.

[30] Twitter. 2016. TWITTER USAGE / COMPANY FACTS. https://about.twitter.com/company

[31] Jianshu Weng and Bu-Sung Lee. 2011. Event detection in twitter. ICWSM 11(2011), 401–408.

[32] Saman Taghavi Zargar, James Joshi, and David Tipper. 2013. A survey of defensemechanisms against distributed denial of service (DDoS) flooding attacks. IEEECommunications Surveys & Tutorials 15, 4 (2013), 2046–2069.

https://about.twitter.com/company

https://about.twitter.com/company

Date post:	16-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Determining the Scale of Impact from Denial-of-Service Attacks in Real Time Using Twitter · 2019....

Documents