StreamCentral Technical Overview

transcript

A trusted partner

Business Powered By Data

What it takes to build Real-Time Operational Intelligence and Big Data solutions

Row level data security

manual development

Massively Parallel Processing Systems

Example: Vertica, GreenPlum,

Neteeza, ParStream

NoSQL DatabasesExample:

MongoDB, Amazon DynamoDB, Cassandra

Relational Databases

Example: Microsoft SQL Server, IBM

DB2, Oracle, Sybase

OLAPExample: Microsoft

SSAS, Cognos Powerplay

NewSQLExample: NuoDB

Discovery & AnalysisTableau, QlikView,

Cognos, SiSense

Reporting(Many)

Event Stream Processing – Developer focused (Tibco, Microsoft, IBM)

Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors)

Data MiningR, SAS, SPSS

Custom Applications

Data Transformation (Ascential Software, Cognos, Microsoft Integration Services)

Store & Manage Data

Process Data

StructuredSemi Structured

Unstructured Data Access & Visualization

Acquire Data

Generate Insights (Correlation, KPIs, Data Denormalization) Manual Custom

Development

HadoopExample:

HortonWorks, Cloudera

Data Access & Security

Real-time and historical

data publishing -

manual development

API Data Export

Database Design and Development

Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events...

Pre work

manual development

Neteeza, ParStream

DB2, Oracle, Sybase

Cognos, SiSense

Reporting(Many)

Event Stream Processing – Developer focused (Tibco, Microsoft, IBM)

Custom Applications

Store & Manage Data

Process Data

Acquire Data

Development

HadoopExample:

data publishing -

manual development

API Data Export

Pre work

Innovations in Big Data technologies over the last 5 years

manual development

Neteeza, ParStream

DB2, Oracle, Sybase

Cognos, SiSense

Reporting(Many)

Event Stream Processing (Tibco, Microsoft, IBM)

Custom Applications

Store & Manage Data

Process Data

Acquire Data

Development

HadoopExample:

data publishing -

manual development

API Data Export

Pre work

Challenging bits not addressed in this innovation cycleThis causes:• Lots of systems integration of

point solutions• Custom code• Specialist skills• Hard to change and evolve

Rapidly industrialize the use of data by designing, building and running real-time business intelligence and big data solutions with StreamCentral.

Solution Designer(Data Consumption, data

transformations, conditions, event, correlation)

Workbench – Easy to Design

Security Designer Systems ManagementAPI Designer

Meta Data Manager

Information Warehouse Manager – Auto BuildDe normalized schema

generation for data marts Security schema generationNormalized schema

generation for Fact and Dimensions

Auto generate database design, auto generate database and application code, infer relationships in data

BI Server – Run with scale

Data Processing

Analytic ApplicationsBI /

Reporting

Data Exploration / Viisualization

Functional Application

Event Driven Predictive Analytics

Industry Application

Association Analysis

Data Collection Business Event Detection

Data Publishing - SQL Server, Vertica,

MongoDBData Export Caching

Putting it together – High impact real-time solutions in fraction of the time

StreamCentral auto builds

security infrastrucure

Vertica

NoSQL DatabasesMongoDB

Relational DatabasesMicrosoft SQL Server

Cognos, SiSense

Reporting(Many)

Built in StreamToMe API (Stream any data from any application or device to StreamCentral)Static data (Connectors)

Custom Applications

Store & Manage Data

Process Data

Acquire Data

Hadoop

StreamCentralBuilt in API

builder

API Data Export

Database Development - StreamCentral auto generates database design and database code

StreamCentral Workbench – No coding required -- Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events... ) – For a broad set of people with varying technical skills

Pre work

Event Stream Processing (No coding)Data Transformation (No coding) Generate Insights (Correlation, KPIs, Data Denormalization) (No coding)

StreamCentral +

Big Data

• Massively Parallel Processing architecture• Distributed processing• Scale out and distribute any

component of StreamCentral independently on commodity hardware• Integrates with best of breed

database technologies

Collector Service

Processing Service

Business Event Service

Data Pubishing

Service

Cache Service

StreamCentral BI Server Scalability

Data available via StreamCentralProcessed

Source Data• Data Validation• Association to

entities• Evaluated for

conditions• Time and location

standardization• Custom dimension

standardization

Single Event Stream

• Correlated data across multiple data sources

• Event detection based on condition evaluation

Event Analysis Data Marts

• Data mart built on highly correlated data

• Updated real-time• Analyze multiple

events and conditions

• Bring together relevant data

360o Analysis Data Marts

• Data mart build on loosely correlated data

• Updated periodically

• Analyze any data

Real-time Push

Historical Pull

API Access:

Real-time Push

Historical Pull

API Access:

Real-time Push

Historical Pull

API Access:

Historical Pull

API Access:

Database Access:

Historical Pull

Database Access:

Historical Pull

Database Access:

Historical Pull

Database Access:

Historical Pull

Example Big Data Solutions: Telco

Telco’s Core IMS Network Data Data, Voice & Video

Performance DataData, Voice & Video Performance Data

Data from Telco Towers

Weather DataTraffic

IncidentsPopulation Data

Data Stream

weatherunderground

MapquestUSA Today Census data

Sources of real time streaming data from networks, devices, services and other internal applications

External sources of data that add understanding of what’s happening when events are detected

Network Test

New Service –

Investment Planning

Adaptive Bit Rate –

Video Streaming

360o Customer QoE for 1st Level

customer service

Video QoE for IPTV

Business Solutions

New revenue

sources from marketing operations

Service Disruption

Making changes to definitions • StreamCentral allows updates to data sources, entities, dimensions, rules for

conditions, event detection rules and data mart definitions• When changes are made using the Workbench updates the schema change

information in the StreamCentral meta data database. It also makes changes to the underlying database schema

• Configuration data for all services running within StreamCentral is also in the distributed cache. The next step is to update this distributed cache. The cache then notifies the various services of the updates in schema definition

• Correlation and the publishing engine evaluate the schema changes and make the appropriate changes to their in-memory data before sending the data to the database

• Roll back is built in to account for errors

Many point solutions from multiple vendorsHigh learning curveMaximum time spent integratingManual design and codingMany steps to solutionOlder technologyYears to Value

= High Risk, = High Cost

Business Analysis

Detailed Solution Design

Manual Database

Design

Database Development

CEP - Development

Platform

Enterprise Service Bus

Traditional ETL tools

Application Development

Workbench – Business Solutions DesignerConsume data, design transformations, conditions, events, analytics, security,

APIs to export and share data

Information Warehouse Manager Auto generate design, auto generate code, infer relationships, reduce manual

design

BI Server Built-in Event Processing, high speed data processing, scalable, secure, run on

modern database platforms

Pre-work Data Acquisition, Transformation and Enrichment

Data Correlation & Event Mgmt

Analytics & insight specific data marts

Data Level Security

Export Enriched Data & Real Time Analytics

High AutomationNo coding requiredContains multiple components that work together (ETL, CEP, data mart builder, location intelligence and more)Fewer steps to solutionModern technologyWeeks to Value

= Low Risk, = Reduced Cost

StreamCentral advantage: Agility to change how you use data in real-time

Current technology and approach

StreamCentral

RiskValue

StreamCentral Concepts

Definitions of key concepts in StreamCentral..• Entity: An entity represents a group of people or groups of things,

that incoming data is directly connected to. Examples include departments, customers, site, products etc. By defining entities you tell StreamCentral how distributed data is connected to things core to your business• Data Source: StreamCentral can pull data from a variety of sources

using standard web interfaces and data can also be streamed directly to StreamCentral API for processing purposes by devices, sensors, applications and services• Dimension: Common attributes in a variety of data sources that can

be used to categorize and analyze data

Definitions of key concepts in StreamCentral..• Conditions: A condition is a rule based measurement that is applied to incoming

data. A condition has three parts to it : The Condition Name (example Voice Quality), Condition Range (Range of quality from Hard to hear, poor, average, toll quality, excellent) and Condition KPI (for example a RED KPI would be when the ranges are Hard to hear and Poor). Individual conditions can be grouped together in a conditions set which can then be used to detect events as an aggregate• Events: An event happens when patterns of multiple conditions with specific

ranges from different data streams and environmental data sources are detected as the data streams in. While StreamCentral allows sophisticated rule based event detection, it goes further than that. StreamCentral auto builds a data mart around the event that consists of a variety of context around the event like entities, environmental data, dimensions and detailed data from data sources

Insight

Who (entities like customer,

patient)

When (time) Where (location)

What (streaming &

static data correlation)

Generating insights from data requires context to be added to the data. This context is a continuous thread that connects all types of data throughout the BI Solution lifecycle. Four typical examples of context..

• StreamCentral automatically builds and maintains time and location dimensions

• Entities like customer, department, site can be created and defined in StreamCentral. Entity data can be imported for initial load and continuously kept in sync

• All incoming data in StreamCentral is continuously and automatically connected to time, location and defined entities

• Resultant real-time events and analytical data marts automatically inherit this context without need for any programming or development work

Converting data to insights by continuously adding context

Types of data sources: Regular• Data sources used to measure performance• Examples include data from that will be measured for conditions, ranges and

events• This data can be connected to entities directly – For example data from a

device can be connected to a customer or sales data can be connected to a product and a customer• Can be used in correlation, event detection and data marts

Types of data sources : Environmental • This source of data is used to add context and measure performance – These are

also called environmental data sources• Example typically include external data that adds context about external factors in play • Does not have to be connected to the entities directly. StreamCentral will use implicit

relations with time and location dimension to tie environmental data to other enterprise data. For example, consider an environmental data source called weather. Weather has location information associated with it. There are two entities namely “Customer” and “Tower”. Both also have location information associated with them. StreamCentral standardizes all three to the location dimension but StreamCentral also implicitly connects Customer to weather and Tower to weather because weather was created as an environmental data source. Now when analyzing data, StreamCentral will be able to provide real-time or historical context as to what the weather is where the customer is and what the weather is where the tower is

• Great to use in data marts for analyzing associations with other data• Can be used in event detection as part of conditions set and to evaluate events

A note on time and location data• StreamCentral auto creates time and location dimensions. • Extended data types allow very specific association of a variety of time and

location based attributes• Data types can be assigned to attributes in entities, regular data sources and

environmental data sources• For every incoming attribute that is associated with one of the special time or

location data types, StreamCentral looks to see if a specific record for that data already exists in the dimension. If not, it creates a new record for that value. If it exists already, then the key value of that data is substituted in the data source• Time and location data is stored in the database and in the distributed cache

though the real-time lookups are done against the data stored in the cache• StreamCentral can dynamically feed time or location data to REST or SOAP based

web services from these dimensions• StreamCentral supports standardizing location data for any geographic level and

supports ability to standardize for specific radius

Types of data outputs available from StreamCentral

• Processed Source Data – Once real-time streaming data or static data via scheduled pull is received by StreamCentral, it is validated, evaluated for conditions and associations to entities and dimensions like time and location are made, the data is available to be published• Event data – Processed data is evaluated for events. If event is detected then

event data along with its associated context is available as a real-time stream. In addition, StreamCentral builds a data mart just for this event. Access to historical data for an event is also available• Events data mart analysis – Custom data marts that evaluate multiple events and

the conditions that were recorded when the events were detected are available via events data mart. Historical access is available • Aggregate 360 degree data mart analysis – Bring disparate data together that is

standardized to common themes and StreamCentral automatically builds a scalable data mart structure for this data

Type of data available

Real-Time access method

Historical access method

Processed Source Data

• ActiveMQ Messages, JMS based hornetQ, OracleQ, Microsoft based MSMQ

• WCF based Pub/ Sub model• Format options - XML/JSON

• REST API – Format options XML/JSON• Method Name: getFactualData• Input parameters: source name, filter

parameters (location, time), numOfRecords

Event Data with context

• ActiveMQ Messages, JMS based hornetQ, OracleQ, Microsoft based MSMQ

• REST API – Format options XML/JSON• Method Name:getEventData• Input parameters: Event name or id,

filter parameters (location, time), entity Id array ,numOfRecords

Events Data Mart • ActiveMQ Messages, JMS based hornetQ, OracleQ, Microsoft based MSMQ

• REST API – Format options XML/JSON• Method Name: getAnalysisData• Input parameters: analysis collection

name or id, filter parameters (location, time), entity Id array ,numOfRecords

Choosing the right technology for visualization• Don’t select a delivery technology for these reasons – Best to use StreamCentral• Centralize business logic in one place – use many tools to deliver the insight• Definition of KPIs• Rules for events• Alias’s for data attributes• Connectivity and transformation requirements of source data• Adding context to data

• Select one or more delivery technologies for these reasons• Performance (in-memory aggregation)• Cross browser support, support for various tablets and mobile device platforms• Broad portfolio of charts and visualizations• Highly interactive• Ability to be integrated in portals for internal (employees) or external (partners or

customers) consumption• Standards based like HTML5 and CSS3• Can be hosted in a SaaS model

Data Security

StreamCentral Database

Workbench administrator defines roles and specifies data access rules. Assign users to roles. StreamCentral builds and manages meta data for row level access

• Centralize data security with StreamCentral• Custom applications and analytical/reporting tools only pass

user id as part of their query to StreamCentral database. • Two types of row level security:

1. Underlying fact data based on dimensions (like time, location) and entities (like customer, department, site)

2. Denormalized aggregated data based on and/or rules

StreamCentral row level security layer

Managing row-level data security

Query with user id

Appropriate data

returned based on

access rights

Factual tables of StreamCentral Database

Security tables of Stream Central DB

ScrtyRoleID

Role Processing Tables in StreamCentral

Stream Central Database (MS SQL / HP Vertica)

StreamCentral Metadata DB

Workbench Administrator will manage data security by creating data access rule

for Roles and assigning Users to Roles

For data accessed from Stream Central Database via reporting / analytical Tools or API , Stream Central will determine the data access permission for that

Stream Central Workbench

Distributed Caching• Storing Time and Location dimension data for fast lookups and data

standardization• Maintaining configuration information about the system which aids in

managing updates to definitions• Storing entity data required for adding context to incoming data• Managing correlation of real-time data• Managing event detection• Processed data formatted to data mart specification• Managing batch data inserts into the database

Availability

OUTSIDE NETWORK

CACHE CLUSTERMicrosoft AppFabric Cache is a distributed caching

technology that allows the cache to be high available by configuring more than one servers to

participate in storing cache data which is often called as Cache Cluster.

Software Network Load Balancing (NLB)Microsoft IIS Web server configured in Software NLB provided by

Microsoft Windows Server allows all Websites to be highly available.

Microsoft Message Queue persists unread messages in the queue in the event of sudden server shutdown. The physical hardware is

available for clustering to ensure fail over in case of hardware failure

Web ApplicationStreamCentral Public APIWorkbench Application

Reports / analytics

MessagingInbound Message QueuePublish Message Queue

Processing ServiceCorrelation Service

Publish Service

Workbench Database (StreamCentral MetaData)StreamCentral Database

(Fact and aggregate data – Vertica/MS SQL Server)

Processing Engine, Correlation Engine, Publish Engine can be

made to run on multiple physical servers to make these

services always highly available.

StreamCentral High Availability

Thank you for your timeRaheel RetiwallaCTO - Virtus IT LtdE: raheel.retiwalla@virtus-it.comM: +1 617 901 8370

A trusted partner

StreamCentral Technical Overview

Technology