Date post: | 28-Nov-2014 |
Category: |
Technology |
Upload: | vicentediazkl |
View: | 32,213 times |
Download: | 2 times |
Vicente DíazSenior Security Analyst, Global Research and Analysis Team
Birds, bots and machines:Detecting fraud in Twitter using Machine Learning
Expectations vs reality
Why Twitter?
Spam - email
Q1 of 2011
Q2 of 2011
Q3 of 2011
Q1 of 2012
Q2 of 2012
Q3 2012
September 2012
October 2
012
November 2012
December 2
012
January 20130.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
Using hacked accounts
Using hacked accounts
Anything else interesting?
#PalabrasNeciasMovistarSorda
Anything else interesting?
#PalabrasNeciasMovistarSorda
Getting profiles
Getting profiles
Getting profiles
A random campaign
Lifespan of bots
Detour – A few words on privacy
Tracking
Advanced trackingIdentify the user:
Passive data: headers, plugins, browser, OS
JS: screen resolution, custom resource detection via Plugins API
(i.e. printers via PDF, fonts via Flash, etc.)
Track IDCookies, Flash cookies (allow cross-domain references),
HTML5 storage, silverlight
Java: own download cache, applets can read embedded resource streams
Future? Apps and games in social networks.
Let´s play
Experiment
• 3 months of tracking• 36 malicious campaigns
• 13,490 profiles• 195,801 tweets
• 6,519,247 relationships
Machine Learning in 60 seconds• Supervised learning• Training – adaptative models• Classification
• Key: choose the right attributes
Machine Learning in 60 seconds• Supervised learning• Training – adaptative models• Classification
• Key: choose the right attributes
Feature selection• Curse of dimensionality• No new knowledge is generated: choose the
right features!
TwitterusernameprofileImgfollowingCount followersCount tweetsCount fullName followingfollowersnumberOfProfileTweetsprotected text
possiblySensitivesourcelocation
coordinatesdescriptionlangurlcreatedAttimeZoneverified
Derived
meanTimeBetweenTweets
friendFollowerRatiotweetsKnownRecv tweetsUnknownRecv percFollowingFollowers
percProfileTweetsWithLink percProfileTweetsToSomeone percProfileTweetsRT
numberOfViasUsed
Mean time between tweets
Tweets to someone
Tweets to someone
After some testing and feature-selection algorithms:
numberOfViastweetsToSomeonetweetsWithLinkfollowingFollowersfriendFollowerRatiotweetsKnownReceivertweetsUnknownReceiver
Avoiding detection
You are doing it wrong!
Avoiding semantic analysis• if its do you me your my do it my be find is but on are its rt that
was
• I a me at get out your they on rt if I get rt can a • u you rt find in I that that your my my find one you so is is my you
this but get all a one its it • they with its your get me of I
Avoiding relationship checks
Avoiding relationship checks
Or just overflow with fake profiles …
DIY
Finding malicious profiles• Not so hard …
AdrianaDickson7
MyrtleTerry11
PatricaFitzpat6
RobertP97792514
RochelleBeasle8
ShannonMunoz13
1 week later…
5200 profiles in this campaign
Around 250 new profiles created every day
0 50 100 150 200 250 3000
20406080
100120140160180
Following
Following
0 50 100 150 200 250 3000
20406080
100120140160180
Followers
Followers
Top tweets sent• Mmmm hot chocolate with cream• Beyonce looks so hot in her new ad• So Hot• Spain !! Too hot• hot summer• a hot bubble bath is much needed• Tea water supposed to hot ya now• Air conditioner-laying on the bed-naked-relax-heaven! So hot tonight!• playing piano and guitar r the only things i can do right in life does this
make me hot enough for a boyfriend yet</p• Austin mahone is just like another justin beiber..he is hot tho!
1800 different tweets
Top tweets sent• Mmmm hot chocolate with cream• Beyonce looks so hot in her new ad• So Hot• Spain !! Too hot• hot summer• a hot bubble bath is much needed• Tea water supposed to hot ya now• Air conditioner-laying on the bed-naked-relax-heaven! So hot tonight!• playing piano and guitar r the only things i can do right in life does this
make me hot enough for a boyfriend yet</p• Austin mahone is just like another justin beiber..he is hot tho!
1800 different tweets
Not only limited to Twitter
Not only limited to Twitter
Not only limited to Twitter
ConclusionsIt is relatively easy to find anomalies
Bots are there for different reasons, mostly fraud-related
Machine learning: lots of resources!
ConclusionsIt is relatively easy to find anomalies
Bots are there for different reasons, mostly fraud-related
Machine learning: lots of resources!
ConclusionsIt is relatively easy to find anomalies
Bots are there for different reasons, mostly fraud-related
Machine learning: lots of resources!
ConclusionsIt is relatively easy to find anomalies
Bots are there for different reasons, mostly fraud-related
Machine learning: lots of resources!
Thank youQuestions?
Vicente Díaz @trompi
Senior Security Analyst, Global Research and Analysis Team