Beyond Matching: Applying Data Science Techniques to IOC-based Detection

Post on 07-Feb-2017

21 views 1 download

transcript

BeyondMatching:ApplyingdatasciencetechniquestoIOC-baseddetection

(#BeyondMatching)

AlexPinto- ChiefDataScientist– Niddel@alexcpsec@NiddelCorp

• SecurityDataScientist• CapybaraEnthusiast• Co-FounderandChiefDataScientistatNiddel(@NiddelCorp)

• LeadofMLSec Project(@MLSecProject)

WhoamI?

• WhatisaNiddel?• NiddelprovidesaSaaS-basedAutonomousThreatHuntingSystem• ResearchfromthistalkwasperformedusinganonymizedNiddeldataandusesconceptsimplementedonitsproducts.• Notavendor-centrictalk,focusonlearningandy’all toreproducethis.

• ThePromiseofIOCs• 7 HabitsofHighlyEffectiveAnalysts(ok,only3)

• Nation-StateAPTDetectionDeluxeRecipe• DataSciencetoAssistonPivoting• MaliciousnessRatio• MaliciousnessRating

• RevisitingTIQ-TEST– TelemetryTest

Agenda

ThePromiseofIOCs

Ifyouhaven’timplementedThreatIntelligencefeedsonyourorganization

Iwillrevealtheendingofyourupcominggruelingjourney

Apologiesinadvance

Promise- SomeDefinitionsFirst• IOCs:Indicatorsofcompromise• CTI:CyberThreatIntelligence

• Willbeusingtheminterchangeablyduringthispresentation

• IOCs->technicaldatathatallowsfor”tactical”discoveryofapotentialcompromiseonasystem

• WewillbefocusingonnetworkIOCsonthistalk

LittleBobbyComicsby@RobertMLee andJeffHaas

Promise– SoundsGreat!Signmeup!• Notsofast,myfriend• MainchallengeswithIOCsconsumption:• QualityandCuration

• Vettingandqualitycontrol• OpenfeedsvsPaidfeeds• ManualvsAutomated

• VelocityandVolume• Howtooperationalize?• AddtoSIEM?• BlockinFirewall/WebProxy?

Promise– QualityandVelocityatOdds• AIS– ThreatIntelsharinginitiativefrom

USDepartmentofHomelandSecurity

• Ifullysupportsharing(seepreviousintelsharingdecksfrom2015)

• Butifweareresignedtothislevelofquality,”itiswhatitis”,howcanCTI/IOCsbeshapedintoausefultoolatscale?

Promise– CurrentImplementationStrategies1. AlertingbasedonmatchingwithIOCdata:• Bybeingcareful,onlymatchingonmore”precise”indicators(URLs>>IPs),

youcanreducenumberofFalsePositives,butstillchallenging

2. UsingIOCdatatobuildcontextforexistingalerts:• Saferbet,butyouarenotaddinganydetectionpowertoexistingcontrols

SPOILER ALERT: Everyone starts with (1) because ”the FPs can’t be that bad”, and then begrudgingly moves to (2) because there is not enough time in the world to go through all the

noise that (1) generates.

SadIntermission

DISCLAIMER:Could not find a picture of a sad capybara. Not sure there is one.

Whatmakesanalystseffective?• Theylearnfromtheexamples!!

• Theydon’tlookatIOCsasa”finishedproduct”,butasawaytolearnfromtheattackerinfrastructure.

• Afterunderstandingandresearchonsamplesofdata,theycanextrapolatetheTTPs(Tactics,TechniquesandProcedures)oftheattackerstobuilddefenses.

PyramidofPainfrom@DavidJBianco

InternetInfrastructure101

Actually, ”everything” is connected

Nation-StateAPTDetectionDeluxeRecipeWhenyour”favoriteIRcompany”blamesFROSTYPENGUINforanattack:1. Findapieceofmalwareoncompromisedorganization2. Extract”non-benign”placestheyconnectto(realworkhere,BTW)3. PivotonInternetInfrastructuretofindrelatedIPs/Domains/URLs4. Searchfortheseonorg,findmoremalware(Hunting,FTW!)5. RepeatSteps1-4untilnomorenewmalware6. Remediateorganization(hopefully!)7. Publishreportorblogposttogreatfanfare8. PROFIT(oratleastmediaattentionandsalesleads)

DataSciencetoAssistonPivoting• Doingitourselves:- Beginwithdatacollection1. GetIOCsfromyourfavorite/availableproviders– thereareafewoptions

thatarefairlygood.Pleasedoselectaccordingtocollectioncriteria.2. ”Enrich”thedatatogatherthe”pivotpoints”andfindtheconnections.

Combine (https://github.com/mlsecproject/combine) can help with IOC gathering and enrichment for ASN data and pDNS (if you have a Farsight pDNS key)

• IPAddresses:• ASnumber• BGPprefix• Country• pDNS relationshiptodomains

• Domainnames:• pDNS relationshiptoIPs• WHOISRegistrations• SOA• NSServers

DataCollection– ExampleWithRIGEKWHOISregistrante-mailonasmallsampleofRIGEKdomainsonOct2016:

DataCollection– ExampleWithRIGEKThisoneisNOTDomainShadowing– activeactorregisteringe-mails:

DataCollection– ExampleWithRIGEKAutonomousSystem/CountryofIPsarelocated,RIGEKsample– Oct2016:

DataCollection– ExampleWithRIGEKAutonomousSystemwhereIPsarelocated,RIGEKsample– Oct2016:

DataAggregation– RigEKExample

In summary: let’s create different graphs for each one of the pivoting points and measure the cardinality of the node connectedness

AS48096- ITGRAD

AS16276– OVHSASL

AS14576– HostingSolutionLtd(actuallyking-servers.com)

DataAggregation– ContextMatters

• Whatifmyfavoritewebsitesareactuallyhostedatthosepivotingpoints?• Imean,thereareafew”ok”thingson.comand.org

MaliciousnessRatioLet’sbuildsimilaraggregationmetricsfor”goodplaces”yourorganizations

Weproposearatiothatcomparesthecardinalityofthenodeconnectedness:• Bpp – countof”badentities”connectedtoaspecificpivotingpoint• Gpp – countof”goodentities”connectedtoaspecificpivotingpoint

𝑀𝑅## = &''

('')&''

Holdon!!GoodPlacesontheInternet?• CreatingandmaintainingwhitelistsisMUCHHARDERthanblacklists

• Sometips:• Useyourowntelemetry- giventhebaseratefallacy,placesthat”everyone”

goestoaremorelikelytobebenign• Raritydoesnotmeanbad(shutup,UEBApeople),buthighvisitationalmost

alwaysmeangood• Harvestdatafromyourownsecuritytools,likewebfilters(ifyoutrustthem)• VeryshallowscoopsofAlexaTopSites.Very.Shallow.

MaliciousnessRatio– Examples• TelemetryfromanpoolofNiddelcustomers:

• AS48096– ITGRAD 87.5%• CountryRU 5.2%• .orgTLD 2.9%

• Lookingatthebaserate:• ASNBaseRate 0.6%• CountryBaseRate 0.58%• TLDBaseRate 1.9%

• SevereoutliersbelowbaseratemayindicatethattheIOCisinvalid

MaliciousnessRating• Aratiofrom0to1canbecoolformathpeople,buthowriskyarethose

thingsanyway?• Weneedtocompareittothebaseratetohaveagoodmeasure• Weproposeamaliciousnessratingwhichexpresshowmuchmorelikelyto

bebadtheconnectionwithaspecificpivotingpointthananaveragepivotingpointofthatkindontheInternet.

𝑀𝑅𝑇## =𝑀𝑅##

∑ 𝑀𝑅##(-)/-01 𝑛3

MaliciousnessRating– SampleDistributions

ChallengeswiththeApproach• Howcanwebestdefinethecuttingscoresonallthosepotential

maliciousnessratings?• Howtocombineandweightthemultivariatecompositionofthesepivoting

points?

• Solutionisprobablyuniquepercompany,includingunderstandingtelemetrypatterns,riskappetiteforFPs/FNsanddecisionpointsonwhentoblockandwhentoalertonsomething.

Whatifthechallengeshadbeensolved?

AMoreInvolvedExample(1)

AMoreInvolvedExample(2)

Buildthecampaignbasedontherelationships- theyallsharethesamesupportinfrastructureontheIPAddressandNameServers.

ShiaLeBeouf Approves

Onemorething…

GoingbacktoTIQ-Test• BiggestcriticismofTIQ-Test(mostlyself-inflicted)isthatiswasalwaysrelative,notabsolute.

• Howcanonedefinewhatita”good”feed?• Doesthatevenmakesense?• Itiseasytotellifafeedisbad(lotsofFPs,lowcuration)

• Mythoughtprocess:• Maybe withtelemetry,youcanidentifyan”applicable”feed• Or”actionable”ifyoulikeyourCybersecuritywithextracamo

ActualalertIOC

accounting

Percentageofthematchesofanspecificfeedthatwereactualalertsorincidentsatanorganization

ActualalertUNIQUEIOCaccounting

PercentageofUNIQUE(onlycontributedbythefeed)matchesofanspecificfeedthatwereactualalertsorincidentsatanorganization

ChallengeswiththeApproach(2)• Howdoesonedefineavalidalertorincident?• NotmanywaysbuttoimproveunderstandingandgrowthofIRpractice:• Yourownincidenthistory(forthe1%-ers)• YourownCTI/IOCcreationprocesses(forthe0.01%-ers)

• The”TelemetryTest”hasbeenINVALUABLEforNiddelonpartnershipandfeedselection

• ”MyThreatIntelligenceCanBeatUpYourThreatIntelligence”(h/tRickHolland)

• Howmuchvaluesdoesafeedaddanyway?Lookforuniquecontributions.

Nomagicthistime– ImproveyourIRprocesses

Takeaways• Lotsofideastoimplement,gogogo!!• IOCs(andCTIingeneralforthatmatter)arenotacompletewasteoftime.It’sjustrawdata,andneedstoberefinedinordertobeusedproperly

• Bringingautomation(andsimplicityofuse)tothreatintelligenceandthreathuntingisparamounttobringitsusabilityfromthe1%oforgstoamorebroadaudienceatscale

Thanks!• Share,like,subscribe,EDMoutro• Q&AandFeedbackplease!

AlexPinto– alexcp@niddel.com@alexcpsec@NiddelCorp

LittleBobbyComicsby@RobertMLee andJeffHaas