REAL-TIME NETWORK ANALYTICS WITH STORM Mauricio Vacas Fausto Inestroza Sonali Parthasarathy.

Post on 26-Mar-2015

214 views 0 download

Tags:

transcript

REAL-TIME NETWORK ANALYTICS WITH STORM

Mauricio VacasFausto Inestroza

Sonali Parthasarathy

Mauricio VacasBig Data Architect

Sonali ParthasarathyReal-Time Processing

Fausto InestrozaBig Data Architect

Anita MehrotraData Scientist

Susie LuVisualization

Krista SchnellVisualization

Rick DrushalEngineering Lead

John AkredProduct Lead

The Team

WHY REAL-TIME?

Distributed Analytics

Real-Time Data Ingestion

Model Prototyping

Exploratory Analytics

Real-Time Rule Execution

PROCESS

UNDERSTAND

REACT

Accenture Cloud Platform

Recommender as a Service

Recommender as a Service

……

Network Analytics Services

Network Analytics Services

Big Data Platform

Drivers

consumer devices

video usage

Issues

Operational Costs

Understanding service quality degradation

Inefficient capacity planning

INGEST PROCESS

VISUALIZE

ANALYZE

STORE

WHY STORM?

Scalability

Reliability

Data types, size, velocity

Mission critical data

Processing, computation, etc.

Time series / pattern analysis

Fault-tolerance

What do we need?

Multiple use cases

How do we get this from Storm?

Processing guarantees

Low-level Primitives

Parallelization

Robust fail-over strategies

Scalability

Reliability

Fault-tolerance

Processing, computation, etc.

PRIMITIVES

Stream

Spout

Bolt

TopologySuboptimal network speed, geospatial analysis

Request info (IP, user-agent, etc)

Pull messages from distributed queue

Sessionization, speed calculation

Tuple Tuple

PARALLELISM

Nimbus Zookeeper

Supervisor

WT T

WT T

Supervisor

WT T

WT T

Topology

Worker Process

Task

Task

Task

Task

Executor Executor

FAULT TOLERANCE

Nimbus

Supervisor

WT T

WT T

Supervisor

WT T

WT T

Supervisor

WT

W

TTT

TT

TT

RELIABILITY

IP2IP2

IP3

IP1

A

IP2IP2

IP3

IP1

A

SUBOPTIMAL NETWORK SPEED TOPOLOGY

AN EXAMPLE

KafkaSpout

Pre-process SessionizeCalculate N/W

Speed per Session

Update Speed per IP

Identify Suboptimal

Speed

Store in Cassandra

Cassandra

Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1)Tuple (ip 1)

Cassandra

KafkaSpout

Pre-process SessionizeCalculate N/W

Speed per Session

Update Speed per IP

Identify Suboptimal

Speed

Store in Cassandra

Tuple (ip 2)Tuple (ip 2)Tuple (ip 2)

Tuple (ip 1)Tuple (ip 1)Tuple (ip 1)

Tuple (ip 1)

Parallelism

Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1)

Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2)

Cassandra

KafkaSpout

Pre-process SessionizeCalculate N/W

Speed per Session

Update Speed per IP

JoinCompare

SpeedStore in

Cassandra

Speed by Location

Stream 1

Stream 2

KafkaSpout

Tuple (ip 1)

Branching and Joins

Tuple (ip 1/NY) Tuple (ip 1/NY)

Tuple (NY)

RULE EXECUTION

Drools

METHOD 1Storm

METHOD 2Storm + Drools

KafkaSpout

Pre-process SessionizeCalculate N/W

Speed per Session

Update Speed per IP

Identify Suboptimal

Speed

Store in Cassandra

Cassandra

Drools

Storm + Drools

Copyright © 2012 Accenture All rights reserved. 28

Integration with Cassandra

Cassandra Optimal for time series data

Near-linear scalable

Low read/write latency

Custom BoltUses Hector API to access Cassandra

Creates dynamic columns per request

Stores relevant network data

Copyright © 2012 Accenture All rights reserved. 29

Lessons Learned

• Rebalance Topology

• Tweak Parallelism in bolt

•Isolation of Topologies

• Use TimeUUIDUtils

• Log4j level set to INFO by default

Copyright © 2012 Accenture All rights reserved. 30

DEMO

Copyright © 2012 Accenture All rights reserved. 31

Next Steps

• Trident

• Externalizing Rules

• Predictive Models

• Real-Time Notifications