Date post: | 16-Jul-2015 |
Category: |
Technology |
Upload: | inside-analysis |
View: | 95 times |
Download: | 1 times |
Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise software, good and bad
Provide a forum for detailed analysis of today’s innovative technologies
Give vendors a chance to explain their product to savvy analysts
Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr The Briefing Room
Topics
February: DATA IN MOTION
March: BI/ANALYTICS
April: BIG DATA
Twitter Tag: #briefr The Briefing Room
Parmenides and the Truth of Now
"Parmenides". Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Parmenides.jpg#mediaviewer/File:Parmenides.jpg
There is no tomorrow
There is no yesterday
There is only today
There is only now
Twitter Tag: #briefr The Briefing Room
Analyst: Mark Madsen
Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, data integration and data management. Mark is an award-winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor to Forbes Online and on the O’Reilly Strata program committee. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.net
Twitter Tag: #briefr The Briefing Room
WebAction
WebAction offers real-time data-driven apps and the underlying enterprise platform
The platform captures structured and unstructured data from a wide variety of data sources and allows users to correlate and enrich data streams
WebAction leverages in-memory data processing and is architected to scale up and scale out
Twitter Tag: #briefr The Briefing Room
Guest: Sami Akbay
Sami Akbay is a founder of WebAction. Prior to WebAction, he served as the CEO of Altibase, Inc., an in-memory RDBMS company with customers in financial services, utilities, and telecommunications. Sami was Vice President of Marketing and Product Management for GoldenGate Software from 2004 through its acquisition by Oracle. Prior to GoldenGate, he served in senior product marketing and business development roles at Embarcadero and AltoWeb. He spent his earlier career in technical and consulting roles working at Rabobank Nederlands, Hearst New Media, American Stock Exchange, MediaMetrix, OneMain.com (Earthlink), and ALK Associates. He is a graduate of Rutgers University.
PROPRIETARY & CONFIDENTIAL
Because actionable insights come from combining analyzed history and what is
happening right now.
PROPRIETARY & CONFIDENTIAL
• Insights come from analyzing historic data:
– What is the average hourly sales for our Boston store on a typical weekday in February?
– Who are my top 1% passengers by revenue for 2014?
– How many dropped calls does my average subscriber experience before cancelling service if they have a 2 year contract and $250 cancellation penalty?
PROPRIETARY & CONFIDENTIAL
• Events without context are not very meaningful
– In the last 30 minutes, we had a revenue of $8,000 in our Boston store.
– Mark Madsen will miss his connection from ORD to EWR because his flight departed late from SFO
– Sami Akbay dropped calls 3 times in the last 30 minutes
PROPRIETARY & CONFIDENTIAL
• Actionable insights combine analyzed history with realtime event streams:
– We typically sell $3000 per hour on a weekday in February at our Boston store. In the last 30 minutes we sold $8,000. Alert the store manager and require ID check at checkout.
– Mark Madsen is a top 1% passenger by revenue. Have an agent meet him at the gate and deliver his boarding pass for the next flight.
– A subscriber will drop 8 calls before becoming a churn risk. Don’t give him a service discount as an incentive if he calls 611.
PROPRIETARY & CONFIDENTIAL
Data Warehouse
Device Data
Industry Data
Social Feeds
Transaction Data
System/ IT Data
Hadoop
ETL
(Existing) ETL
WebAction
Batch /
High-‐Laten
cy
Real9m
e /
Low-‐Laten
cy
EDW
Realtime Applications
Legacy Applications
Pig Hive
Map/Reduce Applications
Users
Hadoop
Device Data
Industry Data
Social Feeds
Transaction Data
System/ IT Data
PROPRIETARY & CONFIDENTIAL
WebAction® delivers the most comprehensive Realtime Stream Analytics Platform
enabling the tailored enterprise-scale Big Data Applications
for the Agile Enterprise
PROPRIETARY & CONFIDENTIAL
Acquire Store Process
Acquire Process in Memory Deliver
BI / Analytics RDBMS EDW
Structured Data
Machine Data
Location Click Stream
Structured Data
Machine Data
Location Click Stream
Data Driven Apps
Batch Reactive
R E A LT I M E B A R R I E R
Proactive Realtime
Visualizations Store
Alerts Integrate
PROPRIETARY & CONFIDENTIAL
Structured and unstructured data
Distributed, in-memory, as data is created
Correlated, enriched, and filtered real-time big data records
Deliver
Process
Acquire
PROPRIETARY & CONFIDENTIAL
Acquire Structured and unstructured data
§ Data from transactional sources is acquired via redo or transaction logs
§ Structured and non-Structured data
§ No Production Impact
§ No Application changes
Device Data
Industry Data
Social Feeds
Real-Time Transaction Data
System/ IT Data
Common File Format
TYPE EXAMPLE COMPLEXITY
CSV, JSON, XML
Facebook, Twitter
Syslogs, weblogs, Netflow
SmartMeter, Medical Device, RFID
SWIFT, HL7, FIX
Oracle, DB2, SQLServer, MySQL, HP NonStop
SIMPLE
VERY HIGH
SIMPLE TO MEDIUM
MEDIUM
MEDIUM
HIGH
PROPRIETARY & CONFIDENTIAL
Process Distributed, in-memory, as data is created
§ Enrich live Big Data with historical data sources
§ Process Big Data faster using partitioned streams, caches, and additional nodes
§ Execute SQL-like queries of in-memory Big Data
§ Alert in real-time based on predictive analytic model results
Acquire Structured and unstructured data
PROPRIETARY & CONFIDENTIAL
Acquire
Process
Structured and unstructured data
Distributed, in-memory, as data is created
Deliver Correlated, enriched, and filtered real-time big data records
§ Continuous Big Data Records § Real-Time Dashboards § Predictive Alerts § Business Trends § Data Patterns § Outliers
PROPRIETARY & CONFIDENTIAL
Metadata
High Speed D
ata Acquisition
WActionStore
Distributed WAction Cache
Distributed DIM Processor
Tungsten Visualization Device Data
Big Data Infrastructure
Industry Data
Social Feeds
Transaction Data
Enterprise Applications
Enterprise Data Warehouse
RDBMS
Data Driven Apps
System/ IT Data
PROPRIETARY & CONFIDENTIAL
• How is it different from – CEP? – ETL? – Messaging? – in-memory database?
Copyright Third Nature, Inc.
We are in a transi*onal phase in IT architecture Then State of Prac*ce Now, forward
Architecture Timeshare Client/server Cloud
Data Core TXs All TXs, some events, docs
All data
Rate of change Slow Rapid Con9nuous
Uses Few Many Everything
Latency Daily+++ < daily to minutes
Immediate
Data plaAorm Uniprocessor SMP, cluster Shared nothing
Copyright Third Nature, Inc.
Majority use of compu*ng over *me
1930s-‐1950s: Calculate
1960s-‐1980s: Automate
1990s-‐2010s: Informate
2010s+: Analyze and Actuate
Computing technology has become a tool of observation and actuation, not just a recipient of human-entered data
Rising organizational com
plexity
Copyright Third Nature, Inc.
The data warehouse vs business agility
All the data
Ready-‐to-‐use common, typed, tabular data
The bo[leneck is you
0 1 2 3 4 5 6 7
Polling is not streaming, minutes is not real *me
32
0 1 2 3 4 5 6 7
The problem is visible here after 2.5 minutes, at the earliest
The problem is visible here 4 seconds after the first bad event
Stream
ing mod
el Po
lling m
odel
Events recorded, processed, stored in DB and ready after 2.5 minutes
Action taken after 3 minutes, at 3.5 minutes
Problem completely resolved at 4 minutes
Something broke
1st bad event detected
Action taken after 3 minutes, at 6 minutes
Problem completely resolved at 6.5 minutes
Reaction takes 3 minutes…
Reaction takes 3 minutes…
Streaming
Polling
Alert threshold Problem
gets worse
Action taken
Copyright Third Nature, Inc.
The data warehouse is not designed for real *me A polling architecture does not work well for event data ▪ Introduces latency ▪ Polling creates performance and scaling problems
The DW can’t handle real-‐9me ingest ▪ One of the original DW design assump9ons: solve for conflic9ng workloads by separa9ng them in 9me ▪ Workload management has limits ▪ Scalability problem for event streams ▪ Spiky flow pa[erns and dynamic scaling
Sta9c schema: ▪ What happens first, upstream change or data model change? ▪ What is your reac9on 9me? The problem of dropped packets
Copyright Third Nature, Inc.
The crea*on and flow of data is different for transac*ons and machine-‐generated events
Data entry Extract Cleanse Load Use
Data Generation
Store
Store
Use
Use
The process for most human-entered data; human speed
The process for machine-generated data; machine speed
Cleanse
Program
Copyright Third Nature, Inc.
Real-‐9me monitoring is not polling Real-‐9me monitoring o"en needs to access history The data in mo9on and the data at rest is the same data.
Therefore:
Real 9me (in mo9on) and persistence (at rest) must be supported by the same architecture
Copyright Third Nature, Inc.
Flowing Unloaded
Sliding window of “now”
Persisted but not yet loaded into DB
Queryable history
Stored in database / datastore
Real *me isn’t either-‐or, it’s part of the architecture
A DB can get you to within minutes (at large scale) but it won’t be easy or cheap
Streaming SQL, stream engines, CEP may be used for these
Real-time monitoring doesn’t use only real-time data: windows, restarts, detecting deviation, so the above boundaries are crossed.
ESB Cache/Queue Database
Copyright Third Nature, Inc.
Deliver
Refine
Manage
Store
Ingest
This implies a new DW architecture, data modeling approach
Analyze
Use
Decouple the data architecture layers
Copyright Third Nature, Inc.
Stream
If you want to do real *me and s*ll manage your data effec*vely then you need this data architecture
Collect Refine Manage Deliver
Flowing Managed history Persisted
Metadata? Metadata ?
Flow, persisted, managed define different storage and retrieval requirements
Copyright Third Nature, Inc.
Ques*ons Why an integrated product rather than other alterna9ves like a RT streaming engine or a streaming SQL database? What do you do at the metadata layer to expose data this is a message, a table, or both? What mechanisms does it use to scale? How does one deploy the user interface por9on of an applica9on? What happens if there’s a reader / writer lag or failure? How do you handle recovery in the event of a stream failure (one stream, correlated stream)? Can you / how do you persist data that you calculate and display? What types of streaming func9ons do you support (e.g., windows – sliding /jump 9me, count, 9me series alignment)? How complex of a calcula9on can you create?
Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
February: DATA IN MOTION
March: BI/ANALYTICS
April: BIG DATA