+ All Categories
Home > Documents > Bots, Outliers and Outages… · and outliers in real time. 3 What’s Elastic? 4 search. 5 search...

Bots, Outliers and Outages… · and outliers in real time. 3 What’s Elastic? 4 search. 5 search...

Date post: 17-Nov-2018
Category:
Upload: trantu
View: 217 times
Download: 0 times
Share this document with a friend
26
Matteo Rebeschini Solutions Architect @ Elastic [email protected] 2018 Phoenix Data Conference Bots, Outliers and Outages… Do you know what's lurking in your data?
Transcript

Matteo RebeschiniSolutions Architect @ [email protected]

2018 Phoenix Data Conference

Bots, Outliers and Outages… Do you know what's lurking in your data?

Abstract

With the mass amounts of data that are being ingested daily it is nearly impossible by traditional means to understand what is hidden in your data. How do you separate the ordinary from the un-ordinary in a timely fashion?

Unsupervised machine learning on time series data enables real-time discovery of those interesting and possibly costly data anomalies. Matteo will describe, build and run several types of machine learning jobs in Elasticsearch that can detect and alert on these anomalies and outliers in real time.

3

What’s Elastic?

4

search

5

search

6

100,000+ Meetup

Members

250M+Product Downloads

5000+Subscription Customers

Statistics since 2012, founding of Elastic

Users across all segments

8

The Elastic Stack(AKA The ELK Stack)

How Elastic Stack Components Work Together

Kibana

Beats

LogstashElasticsearch

LogsMetricsPackets...

Datastore Web APIs

11 All product names, logos, and brands are property of their respective owners and are used only for identification purposes. This is not an endorsement.

ES-Hadoop

Deployment in the Enterprise

Data store Web APIs

Social Sensors

Kafka

Redis

MessagingQueue

Logstash

Workers (2+)

LDAP

Authentication

AD

Notification

SSO

Kibana

Custom UI

Elasticsearch Clients

Elasticsearch

Master (3)

Ingest (X)

Data – Hot (X)

Data – Warm (X)

Machine Learning (2+)

Coordinating (X)

Alerting (X)

HEARTBEAT

Beats

FILEBEAT

METRICBEAT

PACKETBEAT

WINGLOGBEAT

AUDITBEAT

SAML

Elastic Stack

Store, Search, & Analyze

Visualize & Manage

Ingest

Metrics

Logging

APM

SiteSearch

Application Search

BusinessAnalytics

EnterpriseSearch

SecurityAnalytics

Future Solutions

SaaS

Elastic Cloud

Self Managed

Elastic CloudEnterprise Standalone

Deployment

13

Machine Learning with the

Elastic Stack

● Algorithms that

○ Learn from Data

○ Using Statistical Techniques

○ Without Explicit Programming

What’s Machine Learning?

Elastic Machine Learning Scope

Anomaly DetectionAutonomous cars Voice Recognition

Fraud detection

Speech RecognitionLanguage Translation Entity Resolution

Predictive Medicine

Learn to Rank

RecommendationsImage Classification

Elastic Machine Learning Scope

Method

Unsupervised

Data

Time Series

Supervised

Panel

Cross SectionProblem

Classification

Regression

Elastic ML

Anomaly Detection

Elastic Machine Learning Flow

Challenges that Anomaly Detection Solves

18

● IT Operations○ How do I know my systems are behaving normally?○ Where to set thresholds for good alerting?○ How to find the root cause of problems when I don’t know what to look for?

● IT Security○ Do I have systems that are compromised with malware?○ Which users could be an insider threat?

● IoT / SCADA / Other○ Is my factory working normally?○ What do I do with thousands of time-series data points?○ Which traffic incidents are causing the most delay?

19

Detecting (noteworthy) anomalies is hard!

• Data is complex, high dimensional, fast moving• Human inspection is not practical• Easy to miss things

Visual inspection is not practical

Where’s the anomaly?

20

Detecting (noteworthy) anomalies is hard!

• Defining “normal” via static thresholds is hard• Rules don’t evolve with data / infrastructure• Rules can be bypassed

Rule-based alerts are

insufficient

What’s the right threshold?

Elastic Machine Learning• Uses unsupervised machine learning techniques to ▪ Learn what’s “normal” by modeling historic behavior▪ Detect anomalies when data falls outside expected bounds ▪ Use models to predict future behavior (prediction)▪ Use predictions to make decisions

21

Elastic Machine Learning

• Unsupervised techniques - no manual training / input needed• Evolves with the data - “online” model learns continuously• Influencer detection - accelerates root cause identification

22

Detect anomalies of different types

• Time series - single / multiple metric(s)

• Outliers in population (using entity profiling)

• Rare / unusual events

23

24

Demo

Technology Behind Elastic ML

The technology behind Elastic’s Machine Learning (ML) is a bespoke amalgamation of different machine learning methods and techniques that brings sophisticated real-time automated anomaly detection for time series data to users that may not be able to employ data science on their own.

Using techniques such as clustering, various types of time-series decomposition, Bayesian distribution modeling, and correlation analysis, Elastic Machine Learning takes a 100% unsupervised machine learning approach to statistically model data’s time-based characteristics merely by observing its historical behavior.

Behind the scenes, a dynamic, ever-learning statistical model is built and stored, per unique time-series. Real-time data being analyzed both contributes to this model’s maturity and is assessed against the model so that it can be judged for its level of unusualness. If the data’s behavior is seen as being within the low probability range, an anomaly record is created, persisted, and scored proportional to the probability. This score is normalized on a user-friendly dynamic scale between 0 and 100, where 100 is the most unusual thing ever detected for the data set.


Recommended