Post on 13-May-2015
description
transcript
A trusted partner
Business Powered By Data
What it takes to build Real-Time Operational Intelligence and Big Data solutions
Row level data security
manual development
Massively Parallel Processing Systems
Example: Vertica, GreenPlum,
Neteeza, ParStream
NoSQL DatabasesExample:
MongoDB, Amazon DynamoDB, Cassandra
Relational Databases
Example: Microsoft SQL Server, IBM
DB2, Oracle, Sybase
OLAPExample: Microsoft
SSAS, Cognos Powerplay
NewSQLExample: NuoDB
Discovery & AnalysisTableau, QlikView,
Cognos, SiSense
Reporting(Many)
Event Stream Processing – Developer focused (Tibco, Microsoft, IBM)
Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors)
Data MiningR, SAS, SPSS
Custom Applications
Data Transformation (Ascential Software, Cognos, Microsoft Integration Services)
Store & Manage Data
Process Data
StructuredSemi Structured
Unstructured Data Access & Visualization
Acquire Data
Generate Insights (Correlation, KPIs, Data Denormalization) Manual Custom
Development
HadoopExample:
HortonWorks, Cloudera
Data Access & Security
Real-time and historical
data publishing -
manual development
API Data Export
Database Design and Development
Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events...
Pre work
Row level data security
manual development
Massively Parallel Processing Systems
Example: Vertica, GreenPlum,
Neteeza, ParStream
NoSQL DatabasesExample:
MongoDB, Amazon DynamoDB, Cassandra
Relational Databases
Example: Microsoft SQL Server, IBM
DB2, Oracle, Sybase
OLAPExample: Microsoft
SSAS, Cognos Powerplay
NewSQLExample: NuoDB
Discovery & AnalysisTableau, QlikView,
Cognos, SiSense
Reporting(Many)
Event Stream Processing – Developer focused (Tibco, Microsoft, IBM)
Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors)
Data MiningR, SAS, SPSS
Custom Applications
Data Transformation (Ascential Software, Cognos, Microsoft Integration Services)
Store & Manage Data
Process Data
StructuredSemi Structured
Unstructured Data Access & Visualization
Acquire Data
Generate Insights (Correlation, KPIs, Data Denormalization) Manual Custom
Development
HadoopExample:
HortonWorks, Cloudera
Data Access & Security
Real-time and historical
data publishing -
manual development
API Data Export
Database Design and Development
Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events...
Pre work
Innovations in Big Data technologies over the last 5 years
Row level data security
manual development
Massively Parallel Processing Systems
Example: Vertica, GreenPlum,
Neteeza, ParStream
NoSQL DatabasesExample:
MongoDB, Amazon DynamoDB, Cassandra
Relational Databases
Example: Microsoft SQL Server, IBM
DB2, Oracle, Sybase
OLAPExample: Microsoft
SSAS, Cognos Powerplay
NewSQLExample: NuoDB
Discovery & AnalysisTableau, QlikView,
Cognos, SiSense
Reporting(Many)
Event Stream Processing (Tibco, Microsoft, IBM)
Streaming data consumption (APIs, Enterprise Service Bus)Static data (Connectors)
Data MiningR, SAS, SPSS
Custom Applications
Data Transformation (Ascential Software, Cognos, Microsoft Integration Services)
Store & Manage Data
Process Data
StructuredSemi Structured
Unstructured Data Access & Visualization
Acquire Data
Generate Insights (Correlation, KPIs, Data Denormalization) Manual Custom
Development
HadoopExample:
HortonWorks, Cloudera
Data Access & Security
Real-time and historical
data publishing -
manual development
API Data Export
Database Design and Development
Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events...
Pre work
Challenging bits not addressed in this innovation cycleThis causes:• Lots of systems integration of
point solutions• Custom code• Specialist skills• Hard to change and evolve
Rapidly industrialize the use of data by designing, building and running real-time business intelligence and big data solutions with StreamCentral.
Solution Designer(Data Consumption, data
transformations, conditions, event, correlation)
Workbench – Easy to Design
Security Designer Systems ManagementAPI Designer
Meta Data Manager
Information Warehouse Manager – Auto BuildDe normalized schema
generation for data marts Security schema generationNormalized schema
generation for Fact and Dimensions
Auto generate database design, auto generate database and application code, infer relationships in data
BI Server – Run with scale
Data Processing
Analytic ApplicationsBI /
Reporting
Data Exploration / Viisualization
Functional Application
Event Driven Predictive Analytics
Industry Application
Association Analysis
Data Collection Business Event Detection
Data Publishing - SQL Server, Vertica,
MongoDBData Export Caching
Putting it together – High impact real-time solutions in fraction of the time
StreamCentral auto builds
security infrastrucure
Massively Parallel Processing Systems
Vertica
NoSQL DatabasesMongoDB
Relational DatabasesMicrosoft SQL Server
Discovery & AnalysisTableau, QlikView,
Cognos, SiSense
Reporting(Many)
Built in StreamToMe API (Stream any data from any application or device to StreamCentral)Static data (Connectors)
Data MiningR, SAS, SPSS
Custom Applications
Store & Manage Data
Process Data
StructuredSemi Structured
Unstructured Data Access & Visualization
Acquire Data
Hadoop
Data Access & Security
StreamCentralBuilt in API
builder
API Data Export
Database Development - StreamCentral auto generates database design and database code
StreamCentral Workbench – No coding required -- Solution design, rules for data manipulation, rules for monitoring conditions and KPIs, rules for detecting events... ) – For a broad set of people with varying technical skills
Pre work
Event Stream Processing (No coding)Data Transformation (No coding) Generate Insights (Correlation, KPIs, Data Denormalization) (No coding)
StreamCentral +
Big Data
• Massively Parallel Processing architecture• Distributed processing• Scale out and distribute any
component of StreamCentral independently on commodity hardware• Integrates with best of breed
database technologies
Collector Service
Processing Service
Business Event Service
Data Pubishing
Service
Cache Service
StreamCentral BI Server Scalability
Data available via StreamCentralProcessed
Source Data• Data Validation• Association to
entities• Evaluated for
conditions• Time and location
standardization• Custom dimension
standardization
Single Event Stream
• Correlated data across multiple data sources
• Event detection based on condition evaluation
Event Analysis Data Marts
• Data mart built on highly correlated data
• Updated real-time• Analyze multiple
events and conditions
• Bring together relevant data
360o Analysis Data Marts
• Data mart build on loosely correlated data
• Updated periodically
• Analyze any data
Real-time Push
Historical Pull
API Access:
Real-time Push
Historical Pull
API Access:
Real-time Push
Historical Pull
API Access:
Historical Pull
API Access:
Database Access:
Historical Pull
Database Access:
Historical Pull
Database Access:
Historical Pull
Database Access:
Historical Pull
10
Example Big Data Solutions: Telco
Telco’s Core IMS Network Data Data, Voice & Video
Performance DataData, Voice & Video Performance Data
Data from Telco Towers
Weather DataTraffic
IncidentsPopulation Data
Data Stream
weatherunderground
MapquestUSA Today Census data
Sources of real time streaming data from networks, devices, services and other internal applications
External sources of data that add understanding of what’s happening when events are detected
Network Test
New Service –
Investment Planning
Adaptive Bit Rate –
Video Streaming
QoE
360o Customer QoE for 1st Level
customer service
Video QoE for IPTV
Business Solutions
New revenue
sources from marketing operations
Service Disruption
Making changes to definitions • StreamCentral allows updates to data sources, entities, dimensions, rules for
conditions, event detection rules and data mart definitions• When changes are made using the Workbench updates the schema change
information in the StreamCentral meta data database. It also makes changes to the underlying database schema
• Configuration data for all services running within StreamCentral is also in the distributed cache. The next step is to update this distributed cache. The cache then notifies the various services of the updates in schema definition
• Correlation and the publishing engine evaluate the schema changes and make the appropriate changes to their in-memory data before sending the data to the database
• Roll back is built in to account for errors
Many point solutions from multiple vendorsHigh learning curveMaximum time spent integratingManual design and codingMany steps to solutionOlder technologyYears to Value
= High Risk, = High Cost
Agili
ty in
mee
ting
chan
ging
cus
tom
er n
eeds
in re
al-ti
me
Dat
a Re
al-ti
me
or H
istor
ical
| S
trea
min
g or
bat
ch |
Str
uctu
red
or u
nstr
uctu
red
Business Analysis
Detailed Solution Design
Manual Database
Design
Database Development
CEP - Development
Platform
Enterprise Service Bus
Traditional ETL tools
Application Development
Workbench – Business Solutions DesignerConsume data, design transformations, conditions, events, analytics, security,
APIs to export and share data
Information Warehouse Manager Auto generate design, auto generate code, infer relationships, reduce manual
design
BI Server Built-in Event Processing, high speed data processing, scalable, secure, run on
modern database platforms
Trad
ition
al
Pre-work Data Acquisition, Transformation and Enrichment
Data Correlation & Event Mgmt
Analytics & insight specific data marts
Data Level Security
Export Enriched Data & Real Time Analytics
High AutomationNo coding requiredContains multiple components that work together (ETL, CEP, data mart builder, location intelligence and more)Fewer steps to solutionModern technologyWeeks to Value
= Low Risk, = Reduced Cost
StreamCentral advantage: Agility to change how you use data in real-time
Risk
Value
Current technology and approach
StreamCentral
RiskValue
Time
Time
StreamCentral Concepts
Definitions of key concepts in StreamCentral..• Entity: An entity represents a group of people or groups of things,
that incoming data is directly connected to. Examples include departments, customers, site, products etc. By defining entities you tell StreamCentral how distributed data is connected to things core to your business• Data Source: StreamCentral can pull data from a variety of sources
using standard web interfaces and data can also be streamed directly to StreamCentral API for processing purposes by devices, sensors, applications and services• Dimension: Common attributes in a variety of data sources that can
be used to categorize and analyze data
Definitions of key concepts in StreamCentral..• Conditions: A condition is a rule based measurement that is applied to incoming
data. A condition has three parts to it : The Condition Name (example Voice Quality), Condition Range (Range of quality from Hard to hear, poor, average, toll quality, excellent) and Condition KPI (for example a RED KPI would be when the ranges are Hard to hear and Poor). Individual conditions can be grouped together in a conditions set which can then be used to detect events as an aggregate• Events: An event happens when patterns of multiple conditions with specific
ranges from different data streams and environmental data sources are detected as the data streams in. While StreamCentral allows sophisticated rule based event detection, it goes further than that. StreamCentral auto builds a data mart around the event that consists of a variety of context around the event like entities, environmental data, dimensions and detailed data from data sources
16
Insight
Who (entities like customer,
patient)
When (time) Where (location)
What (streaming &
static data correlation)
Generating insights from data requires context to be added to the data. This context is a continuous thread that connects all types of data throughout the BI Solution lifecycle. Four typical examples of context..
• StreamCentral automatically builds and maintains time and location dimensions
• Entities like customer, department, site can be created and defined in StreamCentral. Entity data can be imported for initial load and continuously kept in sync
• All incoming data in StreamCentral is continuously and automatically connected to time, location and defined entities
• Resultant real-time events and analytical data marts automatically inherit this context without need for any programming or development work
Converting data to insights by continuously adding context
Types of data sources: Regular• Data sources used to measure performance• Examples include data from that will be measured for conditions, ranges and
events• This data can be connected to entities directly – For example data from a
device can be connected to a customer or sales data can be connected to a product and a customer• Can be used in correlation, event detection and data marts
Types of data sources : Environmental • This source of data is used to add context and measure performance – These are
also called environmental data sources• Example typically include external data that adds context about external factors in play • Does not have to be connected to the entities directly. StreamCentral will use implicit
relations with time and location dimension to tie environmental data to other enterprise data. For example, consider an environmental data source called weather. Weather has location information associated with it. There are two entities namely “Customer” and “Tower”. Both also have location information associated with them. StreamCentral standardizes all three to the location dimension but StreamCentral also implicitly connects Customer to weather and Tower to weather because weather was created as an environmental data source. Now when analyzing data, StreamCentral will be able to provide real-time or historical context as to what the weather is where the customer is and what the weather is where the tower is
• Great to use in data marts for analyzing associations with other data• Can be used in event detection as part of conditions set and to evaluate events
A note on time and location data• StreamCentral auto creates time and location dimensions. • Extended data types allow very specific association of a variety of time and
location based attributes• Data types can be assigned to attributes in entities, regular data sources and
environmental data sources• For every incoming attribute that is associated with one of the special time or
location data types, StreamCentral looks to see if a specific record for that data already exists in the dimension. If not, it creates a new record for that value. If it exists already, then the key value of that data is substituted in the data source• Time and location data is stored in the database and in the distributed cache
though the real-time lookups are done against the data stored in the cache• StreamCentral can dynamically feed time or location data to REST or SOAP based
web services from these dimensions• StreamCentral supports standardizing location data for any geographic level and
supports ability to standardize for specific radius
Types of data outputs available from StreamCentral
• Processed Source Data – Once real-time streaming data or static data via scheduled pull is received by StreamCentral, it is validated, evaluated for conditions and associations to entities and dimensions like time and location are made, the data is available to be published• Event data – Processed data is evaluated for events. If event is detected then
event data along with its associated context is available as a real-time stream. In addition, StreamCentral builds a data mart just for this event. Access to historical data for an event is also available• Events data mart analysis – Custom data marts that evaluate multiple events and
the conditions that were recorded when the events were detected are available via events data mart. Historical access is available • Aggregate 360 degree data mart analysis – Bring disparate data together that is
standardized to common themes and StreamCentral automatically builds a scalable data mart structure for this data
Type of data available
Real-Time access method
Historical access method
Processed Source Data
• ActiveMQ Messages, JMS based hornetQ, OracleQ, Microsoft based MSMQ
• WCF based Pub/ Sub model• Format options - XML/JSON
• REST API – Format options XML/JSON• Method Name: getFactualData• Input parameters: source name, filter
parameters (location, time), numOfRecords
Event Data with context
• ActiveMQ Messages, JMS based hornetQ, OracleQ, Microsoft based MSMQ
• WCF based Pub/ Sub model• Format options - XML/JSON
• REST API – Format options XML/JSON• Method Name:getEventData• Input parameters: Event name or id,
filter parameters (location, time), entity Id array ,numOfRecords
Events Data Mart • ActiveMQ Messages, JMS based hornetQ, OracleQ, Microsoft based MSMQ
• WCF based Pub/ Sub model• Format options - XML/JSON
• REST API – Format options XML/JSON• Method Name: getAnalysisData• Input parameters: analysis collection
name or id, filter parameters (location, time), entity Id array ,numOfRecords
Choosing the right technology for visualization• Don’t select a delivery technology for these reasons – Best to use StreamCentral• Centralize business logic in one place – use many tools to deliver the insight• Definition of KPIs• Rules for events• Alias’s for data attributes• Connectivity and transformation requirements of source data• Adding context to data
• Select one or more delivery technologies for these reasons• Performance (in-memory aggregation)• Cross browser support, support for various tablets and mobile device platforms• Broad portfolio of charts and visualizations• Highly interactive• Ability to be integrated in portals for internal (employees) or external (partners or
customers) consumption• Standards based like HTML5 and CSS3• Can be hosted in a SaaS model
Data Security
StreamCentral Database
Workbench administrator defines roles and specifies data access rules. Assign users to roles. StreamCentral builds and manages meta data for row level access
• Centralize data security with StreamCentral• Custom applications and analytical/reporting tools only pass
user id as part of their query to StreamCentral database. • Two types of row level security:
1. Underlying fact data based on dimensions (like time, location) and entities (like customer, department, site)
2. Denormalized aggregated data based on and/or rules
StreamCentral row level security layer
Managing row-level data security
Query with user id
Appropriate data
returned based on
access rights
Factual tables of StreamCentral Database
Security tables of Stream Central DB
Stre
am C
entr
al S
ecur
itySt
ream
Cen
tral
Sec
urity
ScrtyRoleID
Role Processing Tables in StreamCentral
Stream Central Database (MS SQL / HP Vertica)
StreamCentral Metadata DB
Workbench Administrator will manage data security by creating data access rule
for Roles and assigning Users to Roles
For data accessed from Stream Central Database via reporting / analytical Tools or API , Stream Central will determine the data access permission for that
user
Stream Central Workbench
Distributed Caching• Storing Time and Location dimension data for fast lookups and data
standardization• Maintaining configuration information about the system which aids in
managing updates to definitions• Storing entity data required for adding context to incoming data• Managing correlation of real-time data• Managing event detection• Processed data formatted to data mart specification• Managing batch data inserts into the database
Availability
OUTSIDE NETWORK
....
CACHE CLUSTERMicrosoft AppFabric Cache is a distributed caching
technology that allows the cache to be high available by configuring more than one servers to
participate in storing cache data which is often called as Cache Cluster.
Software Network Load Balancing (NLB)Microsoft IIS Web server configured in Software NLB provided by
Microsoft Windows Server allows all Websites to be highly available.
Microsoft Message Queue persists unread messages in the queue in the event of sudden server shutdown. The physical hardware is
available for clustering to ensure fail over in case of hardware failure
Web ApplicationStreamCentral Public APIWorkbench Application
Reports / analytics
MessagingInbound Message QueuePublish Message Queue
Processing ServiceCorrelation Service
Publish Service
Workbench Database (StreamCentral MetaData)StreamCentral Database
(Fact and aggregate data – Vertica/MS SQL Server)
Processing Engine, Correlation Engine, Publish Engine can be
made to run on multiple physical servers to make these
services always highly available.
StreamCentral High Availability
29
Thank you for your timeRaheel RetiwallaCTO - Virtus IT LtdE: raheel.retiwalla@virtus-it.comM: +1 617 901 8370
A trusted partner