Date post: | 28-Nov-2014 |
Category: |
Technology |
Upload: | sqlstream-inc |
View: | 3,030 times |
Download: | 1 times |
High-Velocity Big Data:
N ove m b e r 2 0 1 3
d a m i a n @ s q l s t r e a m . c o m
The Coming Clash of Big Data Technologies and Consumer Expecta9on
Damian Black CEO SQLstream
• We live in the world of Big Data. – The Internet of Everything: sensors, services, systems/devices.
• Everybody expects real-time Internet information. – We need a new paradigm for Big Data processing.
• What are the implications on the IT industry and on business? – Let’s imagine a world where we have continuous real-time visibility into
streaming Big Data, and explore the real-time business possibilities.
Setting the scene
• What are the drivers? • Broadband everywhere. • Sensors everywhere. • Wireless everything. • Parallel commodity computation. • Elastic computation (Cloud). • Smartphone (Hi-Res display everywhere).
The emerging world of real-time data
Imagine… Everyone carrying a smartphone. Everyone belonging to a searchable social network. Smartphone apps providing access to all information.
Therefore… You can “follow” anybody, anywhere, at anytime. You can access any information, anywhere, at anytime. All the devices, applications and people are always connected.
Everyone and everything connected
We are talking Really Big Data…
– Exponential, compounded growth
So technology has drawn on the old world processing model… – Store, cleanse then process the data – Analytics means traversing history – Querying against stale snapshots
Exponential growth in data volumes is causing this model to break – Advent of Hadoop – Using parallel processing to combat volumes – Batched-based means very high latency
Contemplate the data management challenge
Rear-view mirror thinking…
Hi s to ry and emergence o f B i g Da ta D
ist
rib
ut
ed
D
at
a P
ro
ce
ss
ing
Evolut ion of data management technology over t ime
1 9 6 0
2 0 0 0 1 9 8 0
Centralized Architecture
Clustered Architecture
Indexed Files: SEQUENTIAL
MODEL
Client-‐Server Architecture
Network Sockets:
SEQUENTIAL MODEL
Messaging Middleware: HIERARCHICAL
MODEL
First Databases HIERARCHICAL
MODEL
BIG DATA: SEQUENTIAL MODEL
subsumed by RELATIONAL MODEL
Data
Warehouses: RELATIONAL MODEL
STREAMING BIG DATA: RELATIONAL MODEL Distributed
Architecture
Evolution of Hadoop: – Google adopts Map-Reduce for indexing the web. – Yahoo emulates Google with Java-based Map-Reduce. – Hadoop open-sourced technology. – CloudEra, HortonWorks and many others create releases. – CloudEra creates Impala and Search.
Limitations of Hadoop: – Storage and archiving architecture. – Massively parallel execution traded off latency. – Not designed for interactive applications or real-time response.
Recent history of Big Data
Moving from high-latency to streaming
Collect
Cleanse
Enrich
Analyze
Share
LOW LATENCY
• Traditional approach leads to high latency: • “Holding tank” buffers for the data.
• Streaming approach: – Replace the intervening “holding tank” buffers
with data pipelines.
– Stream the data continuously through the pipes. – Results stream out immediately.
We have faster data, so surely we need a faster database… 1. To ingest data faster, 2. To get answers faster, 3. For “Big Data Scale”?
Battling conventional wisdom…
…why not stream the nails? I need a bigger, faster hammer!
Everything looks like a nail when all you have is a hammer…
Expects everything to be available and updated real-time: – Integrated aggregated view of services, transactions, accounts… – Able to search and get real-time accurate results – Based on powerful real-time analytics continuously updated
But… Hadoop does not solve this. Neither does database technology. Neither do log file analytics companies.
Surely there’s got to be a better way?
The consumer’s expectation…
The Coming Reality Clash: “Real-‐Rme operaRons” meets “Batch analyRcs”.
Business Intelligence
OperaRons
Real-‐Rme OperaRonal Intelligence ConRnuous monitoring and analyRcs
Faster decision-‐making Automated operaRons
Security Cross-‐selling & Ads
Real-‐Rme PromoRons Quality & Compliance Health and Capacity
Fraud & TheX
As we move toward a real-time business environment, the capability to process data flows swiftly and flexibly will become increasingly important. SQLstream leads the industry in this kind of capability. ”
“
” Robin Bloor
Chief Analyst for Bloor Group
Enabling a whole new world of possibilities…
What is happening?
What might happen?
What just happened?
Make it happen!
• Twitter Storm – 100K downloads, used by Twitter and others.
• Amazon Kinesis – streaming as a service.
• IBM Streams – inventors of SQL now invent SPL2.
• SQLstream s-Server – SQL as the Lingua Franca of data management.
The market wakes up to streaming…
SQLstream, Inc. product suite wins Technology Innova8on Award for IT Analy9cs and P e r f o r m a n c e
S Q L s t r e a m , I n c . e n t e r s -‐ D B T A 1 0 0 -‐ the companies that maIer most in d a t a
Other winners Other Top100 vendors
The market recognizes SQLstream…
About SQLstream
facts o Launched 2009 now 4th
generaRon technology o Deployments spanning
many industries
o World-‐leading benchmarks
capabiliRes o Manage and moneRze
dynamic data assets
o Both unstructured and structured data
o Both SQL and Java as first-‐class alternaRves
innovaRons o Massively scalable
streaming pla_orm
o Only standard SQL streaming engine
o Six patents for stream processing
Streaming Big Data management pla_orm: • does for streaming data what
databases do for stored data
A large, wide-open market opportunity
Analytics Capability
Simple Moderate Advanced
Records Per
Second
25K
1M
10M
20M+
Simple time-series with simple joins
Security Intelligence
Internet of Things
Telecomm
Partitioning, n-way joins, full time-series plus spatial
String matching, regular expressions
SQLstream
“Store Data First” Products
Hig
h V
eloc
ity
Dat
a
The Technology
Streaming Big Data Platform
Historical queries for real-time data enrichment
Storing valuable derived streams for future access
Ope
ratio
nal I
ntel
ligen
ce
Logs
Sensors
GPS
Networks
Social media
RFIDs
Servers
Telecom
Smart grid
Oil & Gas
Manufacturing
LogisRcs
M2M
TelemaRcs
Retail
Internet
Banking
Data centers
AutomoRve
¤ Continuous queries with parallel incremental evaluation ¤ Real-‐Rme processing of unstructured and structured data ¤ Predictive analytics driving automated actions
Use Case and Real-‐world ImplicaRons
• Mozilla Firefox 4 – Real-time Download Monitor • Continuous processing of download requests
• Real-time integration with Hadoop and HBase
Did you see this? (GOOGLE: “Youtube Mozilla Glow”)
Intelligent real-time reactions to: • Every movement of every
customer within my supermarket, shopping mall, holiday complex, road trip…
• Every purchase of every customer in the context of past purchases and other demographic info.
• Every interaction of every customer showing signs of dissatisfaction.
• Every interaction of a prospective customer on my website.
• Every vehicle on my congested road network.
• Every patient or system exhibiting a state of distress.
• Every step taken by anyone trying to break into my property or systems.
• Every phone call, text or application activity on my cloud, hosted service or Telecomm network.
A vision of a Streaming Data driven world…
History repeats itself, especially in computing, …but Stream Processing really is a new paradigm, …and the time has to be right for a new technology to arise.
Stream Processing looks like it is...
The Right Technology at the Right Time.
Concluding Remarks
InfoArmor
• Spun out from JP Morgan Chase protecting 10M credit card holders
• Identity Theft Protection and Internet Surveillance
• Growing at triple digit rates
IDENTITY THEFT MONITORING
• “We evaluated building our own and explored other vendors, but chose SQLstream because they met our requirements entirely and they provided the only 100% ISO ANSI/SQL standards-based streaming platform. That enabled us massive scalability, a very fast deployment and a highly competitive TCO.”
Fortune 500 Customers Benefits
• Scalability with continuous integration • Fast integration • Lower Total Cost of Solution
Performance
Roads & MariRme Services
Opportunity
• Real-time, accurate and open platform for traffic network information from GPS data
• Responsible for roads and waterways in NSW, Australia
• Developed TT5 for advanced real-time traffic information in collaboration with SQLstream
REAL-‐T IME TRAFFIC CONTROL
• “The ability to deliver high quality, critical information to the travelling public in real time helps Roads and Maritime to improve the journey experience, reducing frustration and increasing productivity.”
Total Cost of Performance
R E C O R D S P E R S E C O N D
TOTAL COST OF PERFORMANCE FOR BIG DATA
Pacerns Trends Mining ConnecRons
Searches Inventory Reports StaRsRcs Billing
SOCIAL E-‐COMM SECURITY TELEMATICS TELECOM
Trading AdverRsing Alerts DetecRon Signal
Intelligence
TOTAL C
OST
Intelligence
TELECOM
Pacerns Trends Mining ConnecRons
Searches Inventory Reports StaRsRcs Billing
Trading AdverRsing Alerts DetecRon Signal
SOCIAL E-‐COMM SECURITY TELEMATICS
R E C O R D S P E R S E C O N D
TOTAL C
OST
TOTAL COST OF PERFORMANCE FOR BIG DATA
Think Different – Stream Processing
The Bigger Hammer: Distributed Database Clusters
The Streaming Approach: Rela8onal Stream Processing
Faster IngesRon • Hold it in memory (RAM) • Use in RAM indexing • Compress the data
• Stream the data into queries • Index incrementally on-‐the-‐fly • Recycle memory conRnuously
Faster Answers • Traverse large datasets in RAM • Use lots and lots of servers
• Stream out answers in real-‐Rme
• Just do it… incrementally!
Big Data Scaling Issues
• Synchronize across clusters • Running each query to
compleRon • Forever polling the database
• Umm…what scaling issues?
SQLSTREAM DATAFLOW TECHNOLOGY PIPELINING AND SUPERSCALAR PARALLEL PROCESSING
Fine-grained parallelism: simple, massively scalable, super fast.
Query Processor =
CLEANING & FILTERING
STREAMING ANALYTICS
STREAMING AGGREGATION
CONTINUOUS INTEGRATION
Internet Security Fraud
Prevention Network
Monitoring
CyberAttack Monitoring
Compliance Monitoring
Our Streaming Data Management Platform
Log Files Databases LocaRons Networks Social Media Servers M2M Feeds
s-SERVER
SELECT STREAM ROWTIME, url, numErrorsLastMinute FROM ( SELECT STREAM ROWTIME, url, numErrorsLastMinute, AVG(numErrorsLastMinute) OVER lastMinute AS avgErrorsPerMinute, STDDEV(numErrorsLastMinute) OVER lastMinute AS stdDevErrorsPerMinute FROM ServiceRequestsPerMinute WINDOW lastMinute AS (PARTITION BY url RANGE INTERVAL ‘1’ MINUTE PRECEDING) ) AS S WHERE S.numErrorsLastMinute > S.avgErrorsPerMinute + 2 * S.stdDevErrorsPerMinute;
Streaming SQL for Cloud Monitoring
Business Need: Detect run-away applications
before resource consumption becomes an issue.
Streaming Visualization
s-‐Visualizer
High-Velocity Big Data:
N ove m b e r 2 0 1 3
d a m i a n @ s q l s t r e a m . c o m
The Coming Clash of Big Data Technologies and Consumer Expecta9on
Damian Black CEO SQLstream