Post on 20-May-2020
transcript
Devices Device Connectivity Storage Analytics Presentation & Action
Event Hubs SQL DatabaseMachine
LearningApp Service
Service BusTable/Blob
Storage
Stream
AnalyticsPower BI
External Data
SourcesDocumentDB HDInsight
Notification
Hubs
External Data
SourcesData Factory Mobile Services
BizTalk Services
{ }
• What happened?
• What is happening?
• Why did it happen?
• What will happen?
Past
Present
Future
“Understand the pulse of the Organization”
Everything around us produce data
Traditional Business Intelligence first collects data and analyzes it afterwards
But we live in a fast paced world
Offline data is unuseful
We work with streaming data
We want to monitor and analyze data in near real time
So we don’t have the time to stop, copy data and analyze, but we have to work with streams of data
Batch Analytics
Data Parking into Relational database
Bigdata notion velocity verity volume
Data in motion & live event
Cost effective way
Sort of queries on data warehouse Business competitive age
Intake millions of events per second
At variable loads
Transform, augment, correlate, temporal operations
Elasticity of the cloud for scale out
No hardware (PaaS offering)
Rapid development
TimeDevelopment and operations resources
Infrastructure – Procure and setup
Develop solution (code) for ingress,
processing and egress
Develop solutions to integrate with other
components like ML, BI etc
Develop solutions to manage resiliency,
such as infrastructure failures
Develop solutions and infrastructure for
increasing scale with business growth
Monitoring and Troubleshooting of
solution
Infrastructure – Procure and setup
Develop solution (code) for ingress,
processing and egress
Develop solutions to integrate with other
components like ML, BI etc
Develop solutions to manage resiliency,
such as infrastructure failures
Develop solutions and infrastructure for
increasing scale with business growth
Monitoring and Troubleshooting of
solution
From Event or Data Streams to Real Time Insights in less time with less people resources
End-to-End Architecture Overview
Data Source Collect Process ConsumeDeliver
Event Inputs- Event Hub
- Azure Blob
Transform- Temporal joins
- Filter
- Aggregates
- Projections
- Windows
- Etc.
Enrich
Correlate
Outputs- SQL Azure
- Azure Blobs
- Event Hub
Azure
Storage
• Temporal Semantics
• Guaranteed delivery
• Guaranteed up time
Azure Stream Analytics
Reference Data- Azure Blob
Decrease bar to create Stream Processing Solutions via SQL-like LanguageEasily filter, project, aggregate, join streams, add static data with streaming data, detect patterns or lack of patterns with a few lines of SQL
Built-in temporal semantics
Development and debugging experience through Azure PortalManage out-of-order events & actions on late arriving events via configurations
Rapid DevelopmentRapid Development
Pain Points with other Streaming Solutions
Not an end to end solution
Hard to develop
Need expertise and special skills
Costs lot of money on Development
@ApplicationAnnotation(name="WordCountDemo")
public class Application implements StreamingApplication
{
protected String fileName = "com/datatorrent/demos/wordcount/samplefile.txt";
private Locality locality = null;
@Override public void populateDAG(DAG dag, Configuration conf)
{
locality = Locality.CONTAINER_LOCAL;
WordCountInputOperator input = dag.addOperator("wordinput", new WordCountInputOperator());
input.setFileName(fileName);
UniqueCounter<String> wordCount = dag.addOperator("count", new UniqueCounter<String>());
dag.addStream("wordinput-count", input.outputPort, wordCount.data).setLocality(locality);
ConsoleOutputOperator consoleOperator = dag.addOperator("console", new ConsoleOutputOperator());
dag.addStream("count-console",wordCount.count, consoleOperator.input);
}
}
No code compilation, easy to author and deploy
Brings together event streams, reference data and machine learning extensions
All operators respect, and some use, the temporal properties of events
These should (mostly) look familiar if you know relational databases
Filters, projections, joins, windowed (temporal) aggregates, text and date manipulation
Our toll station has multiple toll booths, where a sensor placed on top of the booth scans an RFID card affixed to the windshield of the vehicles as they pass the toll booth.
The passage of vehicles through these toll stations can be modelled as event streams over which interesting operations can be performed.
Toll Id
EntryTime LicensePlate State Make ModelVehicle Type
Vehicle Weight
Toll Tag
12014-09-10 12:01:00.000
JNB 7001 NY Honda CRV 1 1535 7
22014-09-10 12:02:00.000
YXZ 1001 NY Toyota Camry 1 1399 4 123456789
…
Toll Id ExitTime LicensePlate
1 2014-09-10T12:03:00.0000000Z JNB 7001
2 2014-09-10T12:03:00.0000000Z YXZ 1001
…
Projections
1, 1450, “VW”,
“Golf”, (…)
2, 1230, “Toyota”,
“Camry”, (…)
1, 2400, “VW”,
“Passat”, (…)1, 980, “Ford”,
“Fiesta”, (…)
SELECT TollId, VehicleWeight / 1000 AS Tons FROM EntryStream
1, 1.45 2, 1.23 1, 2.40 1, 0.980
Show me the Toll Id and Vehicle Weight in Tons for all vehicles passing through the Toll Booth
Filters
SELECT Model FROM EntryStream WHERE Make = "VW"
1, 1450, “VW”,
“Golf”, (…)
2, 1230, “Toyota”,
“Camry”, (…)
1, 2400, “VW”,
“Passat”, (…)1, 980, “Ford”,
“Fiesta”, (…)
“Golf” “Passat”
Show me the Model of vehicles manufactured by Volkswagen
Tumbling Windows
SELECT TollId, COUNT(*) FROM EntryStreamGROUP BY TollId, TumblingWindow(minute,5)
How many vehicles entered each toll both every 5 minutes?
Aggregate functions
Scalar functions
Date and time:
String:
Types
Type Description
bigint Integers in the range -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807).
float Floating point numbers in the range - 1.79E+308 to -2.23E-308, 0, and 2.23E-308 to 1.79E+308.
nvarchar(max) Text values, comprised of Unicode characters. Note: A value other than max is not supported.
datetime Defines a date that is combined with a time of day with fractional seconds that is based on a 24-hour clock and relative to
UTC (time zone offset 0).