Date post: | 05-Dec-2014 |
Category: |
Data & Analytics |
Upload: | informaticamarketplace |
View: | 139 times |
Download: | 0 times |
A Practical Guide to Improving the Big Data Ingestion Process
Presented by Alan Lundberg and Amrish ThakkarJuly 22, 2014
Safe Harbor
The information being provided today is for informational purposes only. The development, release and timing of any Informatica product or functionality described today remain at the sole discretion of Informatica and should not be relied upon in making a purchasing decision. Statements made today are based on currently available information, which is subject to change. Such statements should not be relied upon as a representation, warranty or commitment to deliver specific products or functionality in the future.
Informatica MarketplaceA Data Integration Ecosystem
Developers
Partners Consumers
Informatica
• Software, Services Vendors• Strengthen Partnership• Generate Awareness
• Discover Solutions• Evaluate Products• Request Ideas
• Administrators• Architects• Data Analysts• Contribute, Collaborate
• Enable Customers• Engage & Interact• Identify Whitespace
Informatica Marketplace1300+ Apps, Add-ons and Services to jump-start your productivity
Data Integration
Mappings, Utilities, Connectors,Code Testing and Deployment, Monitoring, Job Scheduling
Data Quality
Rules & Reference Data, HealthCheck, Accelerators, Services
Cloud
Connectors, Templates, DataLoaders, Plugins, ProcessAutomation, Services
The Internet of Thingsis opening a world of opportunities
Cardiac Monitors
Truck Tracking,Load
Water Meter,Electricity Meter
Fridge Supply Levels,Washing Machine Check
Gas Level,Car Indicators
Bus Delays,Engine Checks
Shop Inventory Levels
Dock Loads,Container Checks
Electronic Flight Bags, Luggage Tracking
Crop health Indicators
Info Traffic,Video Surveillance
66
GPS Localization
NFC
Chemical Sensors
3D Cameras
Micro bolometersBarometers
Accelerometers
Gyroscopes
Glucometers
Magnetometers
Data / Sensor Diversity…
Architectural Implications
Batch processing
Data structured, homogenous High Volume and variety
Distributed SystemsCentralized Database-centric Client Server Systems
Prioritize Modeling events as enterprise objects / assets
Real Time
Yesterday Today
Events treated as 2nd class citizens
8
Transactions,OLTP, OLAP
Social Media, Web Logs
Machine Device, Scientific
Documents and Emails
Vibe Data Stream
Vibe Data Stream
Vibe Data Stream
Event Processing Engine
How to make sense of it all…
Use Cases – Solving the Difficult Problems
Detect Patterns
ExceptionMonitoring
ProcessMonitoring
• Deviations from norm (Monitoring, Fraud, Error)
• Trending up/down to exceed a threshold
• SLA monitoring
• 3 events within 5 milliseconds• A then B then C occurs• Geospatial processing
• Are process workflows operating properly?
• Are manual processes completed on time?
• Detect Missing Work and Queued Work
Architectural Approach for Streaming Analytics
Operational Data (Field Devices, Applications, Clickstream, IoT, logs, etc.)
Location Context
(e.g. GIS)
Event Based Applications
Various Source Applications / Technologies
Data WarehouseHadoop / NoSQL
Analytics
DataIntegration
PowerCenterCDC / DataAccess
CDCPWX
Ultra Messaging
StreamingCollection
Vibe Data Stream
Streaming Analytics RulePointCEP
Real Time Stream Transport / DeliveryUltraMessaging
StreamTransformation
B2B Data Transformation
Power Exchange
Streaming Collection: Vibe Data Stream (VDS)
• Distribute collection across one or thousands of endpoints
• High performance/efficient streaming data collection over LAN/WAN
• Available ecosystem of light weight agents (sources & targets)
• Continuous ingestion of real-time generated data (sensors; logs; etc.) to multiple targets (batch/stream processing)
• Perform filtering, transformation, etc. “close to the source”
• Provide varying qualities of service
• Streaming, guaranteed, etc.
• Allow for dynamic configuration
• Highly available and scalable
Low latency messaging is the foundation
• The core of Informatica’s Vibe Data Stream is based on the Ultra Messaging platform
• Stream transport is the core of any streaming analytics solution
• Required for key streaming analytics capabilities, including:
• Stream collection
• Stream distribution
• Load distribution and sharing
• Remote connectivity and routing
• Ultra Messaging has been proven in hundreds of low-latency, guaranteed delivery, and fault-tolerant deployments
00:00:46: %LINK-3-UPDOWN: Interface Port-channel1, changed state to up
00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/1, changed state to up
00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/2, changed state to up
00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down
00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/1, changed state to down 2
*Mar 1 18:46:11: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)
18:47:02: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)
*Mar 1 18:48:50.483 UTC: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)
00:00:46: %LINK-3-UPDOWN: Interface Port-channel1, changed state to up
00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/1, changed state to up
00:00:47: %LINK-3-UPDOWN: Interface GigabitEthernet0/2, changed state to up
00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan1, changed state to down
00:00:48: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/1, changed state to down 2
*Mar 1 18:46:11: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)
18:47:02: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)
*Mar 1 18:48:50.483 UTC: %SYS-5-CONFIG_I: Configured from console by vty2 (10.34.195.36)
10s100s1,000s?Market DataWeb Log Data
Device Data
Sensor Data
Location data
Call Records
Social data
VDS Node
VDS Node
VDS Node
VDS Node
VDS Node
VDS Node
VDS Node
VDS Node
VDS Node
VDS Node
VDS Server
ZooKeeper
logserver1logserver2logserver3logserver4logserver5logserver6logserver7logserver8
?
TransformationsTargetsSources
Ecosystem of Sources and Targets
Power Center
B2B Data TX
RulePoint
… and evolving
Vibe Data Stream vs Flume
VDS Flume
Architecture Broker-less Non-messaging
Configuration Automatic Manual
Failover Automatic Automatic
Functionality Event Aggregation/ Messaging
Log Aggregation
Recommended QoS Guaranteed Guaranteed
Primary Application Trades/CDRs/logs/ etc.
logs
Monitoring Yes No
Enterprise Product integration
Informatica product line
No
Vibe Data Stream performance vs Flume-ng
Vibe Data Stream
Flume
200
20.67
>
10x performance
Test SetupEvent Size: 300 bytesSource Type: Syslog
Number of Sources: 16Target Type: HDFS
Hadoop Cluster: 9-nodeVDS/Flume Nodes: 1
MB/sec
MB/sec
• Demo
Download Vibe Data Stream Free Today!
• Vibe Data Stream Open Access Download:http://www.marketplace.informatica.com/vds