Teradata Listener &
A Streaming Analytics Solution with the Teradata UDA
DTG
• The Internet of Things (IoT)
• Smart electric grids
• Twitter, Facebook, etc.
• Financial markets
• News feed providers
• Web navigation tracking tools
• Weather and earthquake sensors
• Telecommunication networks
• Set-top boxes
• Fleets of moving vehicles
• Mobile location
The Case for Real Time Data
DTG
Business Use Cases: Ingest/Distribute Streaming Data
IoT/sensors
Real-time streaming data
Web-page activity
Web clicks in real time
Security & surveillance Logins, data access
Product recommendations
Product offers feedback
Email compliance
Agent emails or spam
Retailer Cyber Monday
Sales activity
Customer satisfaction index
Positive & negative events
Reservations brokers
Events and resources
Social media watch
RSS, tweets, PR, blogs
Track and trace logistics
Vehicles and containers
DTG
Acquisition Analytics Access
EMERGING
APP FRAMEWORK
Data Engines
CONVENTIONAL
MULTI GENRE
DATA WAREHOUSE
IN MEMORY
DATA LAKE
No SQL
COMPUTE CLUSTER
OPERATIONAL VIRTUAL
QUERY
Business
Intelligence
Languages
Integrated
Development
Environment
Users
Operational
Systems
Customers
Partners
Engineers
Data
Scientists
Business
Analysts
Knowledge
Workers
Marketing
Executives
Platform Services DEVELOPMENT DATA OPERATIONS
PRIVATE HYBRID Cloud Deployment PUBLIC
Sources
ERP
SCM
CRM
Sensors
Audio
and
Video
Machine
Logs
Text
Web and
Social
REAL TIME
…
…
…
…
INGEST
Acquisition Analytics Access
EMERGING
APP FRAMEWORK
Data Engines
CONVENTIONAL
MULTI GENRE
DATA WAREHOUSE
IN MEMORY
DATA LAKE
No SQL
COMPUTE CLUSTER
OPERATIONAL
QueryGrid
VIRTUAL QUERY
Users
Operational
Systems
Customers
Partners
Engineers
Data
Scientists
Business
Analysts
Knowledge
Workers
Marketing
Executives
Platform Services DEVELOPMENT DATA OPERATIONS
PRIVATE HYBRID Cloud Deployment PUBLIC
Sources
ERP
SCM
CRM
Sensors
Audio
and
Video
Machine
Logs
Text
Web and
Social
AppCenter
REAL TIME
…
…
…
…
Aster Analytics
R, Spark, Giraph
SAS, SPSS, KXEN
Teradata
Database
Hadoop
Teradata
Database
Business
Intelligence
Languages
Integrated
Development
Environment
INGEST
Listener
6 © 2014 Teradata
Teradata Listener A brief Introduction
Listener Distributes Many Sources to Many Targets
Hadoop2
Hadoop1
Teradata2
Teradata1
Aster1
Sources Targets
DTG
Listener
Predictive Parts Failure Use Case
Real Time Analytics Pipeline Build • Discover & Build analytics against sensor data w/ Aster • Create Ingestion Stream with Listener streaming Sensor data
in from machines in the field – Stream forks to Teradata and Spark
• Deploy model onto Spark where Listener streams the sensor data into Operational Action Zone
• Alerts fire out of Spark based on Aster model/rules – Email alert and database logging as new events
9 © 2014 Teradata
Self-service platform to build, deploy, manage, and run data centric apps for the enterprise
at scale across the UDA
AppCenter
Single discovery platform for ALL users – Business, Analyst & Data Scientist
IDE
SELECT n.event_path, count(*)
FROM nPath(
ON (
SELECT *
FROM telco_data td, profile p
WHERE d.customer_id = p.customer_id
)
PARTITION BY customer_id
ORDER BY timestamp
MODE( overlapping )
PATTERN(‘EVENT+.CANCEL_SERVICE_EARLY’)
SYMBOLS(
action<>‘CANCELSERVICE’ASEVENT,
BI & Open Source Visualization Tools
AppCenter & Guided UI SQL Client
Business Analysts R User Data Scientists
Enables Discovery Development and Execution
R Client
Time to Value Acceleration
(actionable insights in hours, days or weeks)
The Guided UI: Unique Path & Pattern Analysis (nPath) A user-friendly, form-based approach to Path & Pattern Analysis that requires no SQL knowledge
Leverages prebuilt and patented analytics functions
Results can be visualized, published and shared with others
Insights are truly now just a few clicks away
AppCenter Repeatable Analytics, Shareable Results
AppCenter provides your organization with the ability to store your most valuable analytics workflows as apps, execute them with ease, visualize the results and share the insights with others.
Store your packaged analytics apps in a single location.
Execute your apps on demand or use the built-in scheduler.
Visualize your results using open source visualizations and/or BI tools.
Share your results and visualizations throughout your organization and beyond.
Predictive Parts Failure Discovery
• Discover & Build analytics against sensor data w/ Aster – AppCenter currently for Aster only, UDA AppCenter in Roadmap
• Create Ingestion Stream with Listener streaming Sensor data in from machines in the field – Stream forks to Teradata and Spark
• Deploy model onto Spark where Listener streams the sensor data into Operational Action Zone
• Alerts fire out of Spark based on Aster model/rules – Email alert and database logging as new events into Teradata
Analytics (Model Builder)
ASTER FRAMEWORK
Training Data
Aster Model
Prediction
Response
Prediction
Requests
Aster Scorer
Queries
USER FRAMEWORK
AML Generator
AML
File
Scoring in the customer’s real time environment
Operationalization Overview with Aster 6.21
Aster
Model File
Rapid Iteration on Massive Data @ Rest
Easy Deployment
No Model Re-Create
Predictive Parts Failure Discovery
• Discover & Build analytics against sensor data w/ Aster
• Create Ingestion Stream with Listener streaming Sensor data in from machines in the field – Stream forks to Teradata and Spark
• Deploy model onto Spark where Listener streams the sensor data into Operational Action Zone
• Alerts fire out of Spark based on Aster model/rules – Email alert and database logging as new events into Teradata
Text Tagger
Prediction
Requests2
ASTER FRAMEWORK
Prediction
Requests1
Action
TT Scorer
Text from Listener stream
Tagged
Text Tagger AML file
USER FRAMEWORK
Single Decision Tree
Single Decision Tree AML
file SDT Scorer Scores
Multiple scoring operations UDA Listener stream Operationalization Overview
Packaging and Installation • Package of Scorer JAR files
– Implemented in Java
– Compatible with JDK 1.6 onwards
– Packaged into jar file along with all
scoring libraries and documentation
– Light-weight, Thread-safe
– Platform independent
• AML Generator in the Analytics Base package
– Analytics base package will contain the AML generator function
– AML generator creates XML files describing data transformations and models built on the Aster cluster
– Aster Analytics User Guide for documentation
Operationalization Overview
Customers separate deployment activities from data science. Roles can be performed independently by experts with appropriate skillsets.
Data Scientist DevOps
Engineer
1. Transform the data on Aster 2. Train a model on Aster 3. Generate an AML file for each 4. Export the AML file and transfer them to the
real-time env. 5. Refresh the model with a new AML file
generated from the model
1. Deploy the scorer jar file 2. Deploy the AML files into the real-time environment 3. To refresh the model, get the latest AML file
Operationalization Overview
Acquisition Analytics Access
EMERGING
APP FRAMEWORK
Data Engines
CONVENTIONAL
MULTI GENRE
DATA WAREHOUSE
IN MEMORY
DATA LAKE
No SQL
COMPUTE CLUSTER
OPERATIONAL
Users
Operational
Systems
Customers
Partners
Engineers
Data
Scientists
Business
Analysts
Knowledge
Workers
Marketing
Executives
Platform Services DEVELOPMENT DATA OPERATIONS
PRIVATE HYBRID Cloud Deployment PUBLIC
Sources
ERP
SCM
CRM
Sensors
Audio
and
Video
Machine
Logs
Text
Web and
Social
AppCenter
REAL TIME
…
…
…
…
Aster Analytics
R, Spark, Giraph
SAS, SPSS, KXEN
Teradata
Database
Hadoop
Teradata
Database
Business
Intelligence
Languages
Integrated
Development
Environment
INGEST
Listener QueryGrid
VIRTUAL QUERY
DEPLOY
21 © 2014 Teradata
Radically Simplify Big Data Streaming
LISTENER
22
Listener Makes It Simple
Teradata Listener is an intelligent, self-service software solution for ingesting and distributing fast moving data streams
© 2014 Teradata
Teradata Listener Features
Enterprise wide solution for ingesting high volume real-time streams of
data
Automatic calculation of volume, latency & monitoring metrics
Pre-built integration with Teradata UDA for persisting data in real-time or
batches
Fully supported & enterprise grade
Self-service & governance for users
Build real-time streaming analytics, power real-time dashboards, generate
alerts using Teradata Listener APIs
Microservice cloud architecture
Listener Data Flow & Engine Integration
• A source can be persisted to one or more systems or multiple sources can be persisted to a single system
• Data can be persisted in near real time or batches (records or time) • Data can also be streamed out of Listener to external 3rd party processing engines (e.g. Storm,
Spark, TIBCO, IBM, VoltDB, etc.)
Teradata Listener SOURCES
IoT
SOURCES
SOURCES
SOURCES
iOS App
SOURCES
Website
SOURCES
Sensor
SYSTEMS
HBase
SYSTEMS
Aster
SYSTEMS
HDFS
SYSTEMS
Teradata
25
Listener Data Flow
2
5
SOURCES SYSTEMS FIREHOSE STREAMS ROUTER WRITERS INGEST
Write tuples Write to firehose Write to streams
Read mini-batches Read mini-batches Multiple sources
DTG Stream API
26
Listener Capabilities
Scale and volume Scale out multiple inputs and outputs
Volume metrics Monitor data movement
with easy dashboards
Real-time Ingest
Continuous streams Capture and exploit
new data sources
Easy setup and go Add new sources and
targets in minutes
Self-Service and
Governance
Pause / resume Pause data streams for
maintenance
Security built in Owners and admins
Enterprise platform For the entire enterprise
Highly available Failover and data
protection built in
API everywhere Access micro services
via RESTful APIs
Enterprise Class
DTG
Predictive Parts Failure Discovery
• Discover & Build analytics against sensor data w/ Aster • Create Ingestion Stream with Listener streaming Sensor data
in from machines in the field – Stream forks to Teradata and Spark – Spark is currently a roadmap Target
• Deploy model onto Spark where Listener streams the sensor data into Operational Action Zone
• Alerts fire out of Spark based on Aster model/rules – Email alert and database logging as new events into Teradata
Real-Time Analytic Pattern Options
• There are multiple option for Real-Time analytics, we discussed only one today. Others include:
• Listener Stream -> Aster model operationalized on Spark or Storm
• Listener Stream -> SAS model operationalized on Teradata
• Listener Stream -> FuzzyLogix based model on Teradata
• Web Service Integration -> ARTIM next best action interaction
– ARTIM has a nice business UI and leverages Machine Learning algorithms to optimize the offers served
Q&A
• Discussion
MQ, JMS
What is Real Time?
Real time Synchronous
1-100ms
Near real time Asynchronous
100ms-1 minute
Batch minutes to hours
Listener
Micro batch
1-5 minutes
Apps
Hadoop1
Teradata1
Aster1
DTG
Data-in-motion Data-at-rest