+ All Categories
Home > Documents > Using the SplunkMachine Learning Toolkit to Create … Create Your Own Custom Models ... •...

Using the SplunkMachine Learning Toolkit to Create … Create Your Own Custom Models ... •...

Date post: 09-Mar-2018
Category:
Upload: lenguyet
View: 222 times
Download: 4 times
Share this document with a friend
61
Copyright © 2016 Splunk Inc. Dr. Adam Oliner Director of Engineering, Data Science, Splunk Using the Splunk Machine Learning Toolkit to Create Your Own Custom Models Manish Sainani Principal Product Manager, Splunk
Transcript

Copyright©2016Splunk Inc.

Dr.AdamOlinerDirectorofEngineering,DataScience,Splunk

UsingtheSplunk MachineLearningToolkittoCreateYourOwnCustomModels

ManishSainaniPrincipalProductManager,Splunk

Disclaimer

2

Duringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfutureeventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose

containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice.Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeaturesor

functionalitydescribedortoincludeanysuchfeatureorfunctionalityinafuturerelease.

Whoarewe?

3

Dr.AdamOliner– DirectorofEngineering,DataScience&MachineLearning– Splunker for2years– Embarrassinglyovereducated

ManishSainani– PrincipalProductManager,MachineLearning– Splunker for2years– FirstMLhireatSplunk!

Whatarewedoinghere?

4

OverviewofMachineLearningTheAssistants:GuidedMachineLearning– Prepare– Fit– Validate– Deploy

Examples– DIYAnomalyDetector– CustomerApplications

OverviewofMLatSplunk

CorePlatformSearch PackagedPremiumSolutions CustomML

PlatformforOperationalIntelligence

SplunkMachineLearningToolkit

Assistants: Guidemodelbuilding,testing,&deployingforcommonobjectivesShowcases: InteractiveexamplesfortypicalIT,security,business,IoTusecases

Algorithms: 25+standardalgorithmsavailableprepackagedwiththetoolkitSPLMLCommands:Newcommandstofit,testandoperationalizemodelsPythonforScientificComputingLibrary:300+opensourcealgorithmsavailableforuse

Buildcustomanalyticsforanyusecase

ExtendsSplunkplatformfunctionsandprovidesaguidedmodelingenvironment

What’sNewsinceour0.9BetaRelease(lastyear’s.conf)?

7

• Newnameandabbreviation;-)• Noeventlimits(removalof50Klimitonfittingmodels)

• Configurableresourcecapsviamlspl.conf

• Searchheadclusteringsupport• Distributed/streamingapply• Scheduledfit• Newalgorithms(nextslide)

– Featureengineeringandselection– Stochasticgradientdescent(e.g.)– ARIMA

• Multi-algorithmsupportacrossAssistants

• Scatterplotmatrixviz• Alerting• Tooltips• In-apptours• ClusterNumericEventsassistant• VideosvideosvideosforeachassistantacrossIT,Security,IoT andBusinessAnalytics

• ML-SPLCheatSheet

Algorithmssupported(v2.0,.conf2016)

TheAssistants:GuidedMachineLearning

MachineLearning

10

AprocessforgeneralizingfromexamplesExamples– A,B,…→ # (regression)– A,B,... → a (classification)– Xpast → Xfuture (forecasting)– likewithlike (clustering)– |Xpredicted – Xactual|>>0 (anomalydetection)

MachineLearningProcess

11

CollectData

Explore/Visualize

Model

Evaluate

Clean/Transform

Publish/Deploy

MachineLearningProcesswithSplunk

12

CollectData

Explore/Visualize

Model

Evaluate

Clean/Transform

Publish/Deploy

props.conf,transforms.conf,DatamodelsAdd-onsfromSplunkbase,etc.

Pivot,TableUI,SPLMLToolkit

Alerts,Dashboards,Reports

DomainExpertise(IT,Security,…)

DataScienceExpertise

SplunkExpertise

CustomMachineLearning– SuccessFormula

Identifyusecases

Drivedecisions

Setbusiness/opspriorities

SPL

Dataprep

Statistics/mathbackground

Algorithmselection

Modelbuilding

SplunkMLToolkitfacilitatesandsimplifiesviaexamples&guidance

Operationalsuccess

GuidedMLwiththeAssistants

14

Guidesyouthroughvariousanalytics– Prepare,fit,validate,anddeploy

AutomaticallygeneratesalltherelevantSPL

Assistants:Fit

15

Assistants:Validate

16

Assistants:Deploy

17

TheAssistants

18

1. PredictNumericFields2. PredictCategoricalFields3. DetectNumericOutliers4. DetectCategoricalOutliers5. ForecastTimeSeries6. ClusterNumericEvents

PredictNumericFields

19

Algorithms– LinearRegression

ê …includingLasso,Ridge,andElasticNet– KernelRidge– DecisionTreeRegressor– RandomForestRegressor– SGDRegressor

Validation– Fourvisualizationsofpredictionerror– R2 andRMSE

PredictCategoricalFields

20

Algorithms– LogisticRegression– DecisionTreeClassifier– RandomForestClassifier– SGDClassifier– SVM– NaïveBayes

ê BernoulliNB andGuassianNB

Validation– Precision,recall,accuracy,F1– Confusionmatrix

DetectNumericOutliers

21

Methods– Standarddeviation– Medianabsolutedeviation– Interquartilerange

Validation:

DetectCategoricalOutliers

22

StatisticalmethodsValidation:

ForecastTimeSeries

23

Algorithms– State-spacemethodusingKalman filter– ARIMA

Validation

ClusterNumericEvents

24

Algorithms– KMeans– DBSCAN– Birch– SpectralClustering

Validation– ScatterplotMatrixviz

Prepare

DataGatheringandPrep

26

Source:CrowdFlower

Splunk!

27

Leadingplatformforcollecting,cleaning,andtransformingdataInteractiveFieldExtractorDatamodelsHundredsofadd-onsfromSplunkbasetransforms.confprops.confetc.

FeatureEngineeringTFIDF(term-frequencyxinversedocument-frequency)– Transformfree-formtextintonumericattributes

StandardScaler (i.e.normalization)FieldSelector (i.e.choosekbestfeaturesforregression/classification)PCAandKernelPCA

PreprocessingintheAssistants

29

Fit

Fit:What’sNew

31

NoeventlimitsConfigurableresourcecaps(ml-spl.conf)SearchheadclusteringsupportScheduledfitNewalgorithms

Fit:What’sNew

32

Validate

Validate/Apply:What’sNew

34

ConfigurableresourcecapsSearchheadclusteringsupportDistributed/streamingapplyScatterplotmatrixviz

ScatterplotMatrixViz

35

Deploy

DeployanywhereinSplunk!

37

ScheduledtrainingAlertingReportsanddashboardsAugmentedsearchresultsetc.

Deploy:What’sNew

38

DistributedApply– Applymodelstoindexeddata– Streaming

ScheduledtrainingAlerting

What’sNew:ScheduledFit

39

What’sNew:Alerting

40

Example:DIYAnomalyDetector

Let’sBuildanAnomalyDetector!

42

We’llusetwoAssistants– PredictNumericFields– DetectNumericOutliers

Showautomatically-generatedintermediateSPL

FitaPredictiveModel

43

SetupScheduledTraining

44

OpenResidualsinSearch

45

OpenDetectNumericOutliersAssistant

46

DetectOutliers(LargePredictionErrors)

47

ScheduleanAlert

48

ScheduleanAlert

49

ScheduleanAlert

50

ManageYourNewAnomalyDetector

51

TheAssistantGeneratedtheSPLforYou

52

TheAssistantGeneratedtheSPLforYou

53

YouBuiltanAnomalyDetector!

54

YoubuiltapredictivemodelofACPowerWhenthepredictionerrorfromthismodelisanoutliercomparedtopasterrors,yougenerateanalertThispredictivemodelautomaticallyretrainsitselfonascheduleyoucontrolYoudidn’thavetotypeanySPL

#winning

MachineLearningCustomerSuccess

NetworkOptimizationDetect&PreventEquipmentFailure Security/FraudPrevention

PrioritizeWebsiteIssuesandPredictRootCause

PredictGamingOutagesFraudPrevention

MachineLearningConsultingServices AnalyticsAppbuiltonMLToolkit

Optimizingoperationsandbusinessresults

PreventCellTowerFailureOptimizeRepairOperations

Entertainment Company

15

MachineLearningToolkitCustomerUseCases

57

Speedingwebsiteproblemresolutionbyautomaticallyrankingactionsforsupportengineers

Reducingcustomerservicedisruptionwithearlyidentificationofdifficult-to-detectnetworkincidents

Minimizingcelltowerdegradationanddowntimewithimprovedissuedetectionsensitivity

Improvinguptimeandloweringcostsbypredicting/preventingcelltowerfailuresandoptimizing repairtruckrolls

Predictingandavertingpotentialgamingoutageconditionswithfiner-graineddetection

EnsuringmobiledevicesecuritybydetectinganomaliesinIDauthentication

PreventingfraudbyIdentifyingmaliciousaccountsandsuspiciousactivitiesEntertainment Company

DetectNetworkOutliersReduceddowntime+increasedserviceavailability=bettercustomersatisfaction

58

MLUseCase Monitornoiserisefor20,000+celltowerstoincreaseserviceanddeviceavailability,reduceMTTR

Technicaloverview • Acustomizedsolutiondeployedinproductionbasedonoutlierdetection.• Leveragepreviousmonthdataandvotingalgorithms

“TheabilitytomodelcomplexsystemsandalertondeviationsiswhereITandsecurityoperationsareheaded…SplunkMachineLearninghasgivenusaheadstart...”

ReliablewebsiteupdatesProactivewebsitemonitoringleadstoreduceddowntime

59

“SplunkMLhelpsusrapidlyimproveend-userexperiencebyrankingissue severitywhichhelpsusdeterminerootcausesfasterthusreducingMTTRandimprovingSLA”

• Veryfrequentcodeandconfig updates(1000+daily)cancausesiteissues• Finderrorsinserverpools,thenprioritizeactionsandpredictrootcause

• CustomoutlierdetectionbuiltusingMLToolkitOutlierassistant• BuiltbySplunkArchitectwithnoDataSciencebackground

MLUseCase

Technicaloverview

WhatNow?

60

GettheMachineLearningToolkitfromSplunkbaseGowatchMachineLearningVideosonSplunkYoutube Channelhttp://tiny.cc/splunkmlvideosGotoMachineLearningstalks:– AdvancedMachineLearninginSPLwiththeMachineLearningToolkitbyJacobLeverich– ExtendingSPLwithCustomSearchCommandsandtheSplunkSDKforPythonbyJacobLeverich

SeveralCustomersandPartnerTalks– Cisco,Scianta Analytics,AsianTelco,etc.EarlyAdopterAndCustomerAdvisoryProgram:[email protected]:[email protected]:[email protected]

http://tiny.cc/splunkmlapp

THANKYOU


Recommended