Date post: | 03-Jun-2018 |
Category: |
Documents |
Upload: | razvan-julian-petrescu |
View: | 216 times |
Download: | 0 times |
of 34
8/12/2019 Big Data Analytics Presentation
1/34
Big Data Analytics
Otto Medin & Louise ParberrySales Engineers
8/12/2019 Big Data Analytics Presentation
2/34
How big is big? Response time requirements
Scalability requirements
Budget
Big Data Analytics Overview
8/12/2019 Big Data Analytics Presentation
3/34
Big Data By The Numbers Data load limit ~400 MB/sec (commodity server)
3 terabyte data load
$180 hard drive ~7860 sec (~2 !hrs)
1 exabyte
$63 million
87 years
Big Data Analytics
8/12/2019 Big Data Analytics Presentation
4/34
The key to Big Data Analytics:
PARALLELIZE!(if you want a quick result, that is)
Big Data Analytics
8/12/2019 Big Data Analytics Presentation
5/34
Big Data Analytics Overview Parallelization
Academy Model
Exercises
Agenda
8/12/2019 Big Data Analytics Presentation
6/34
Splitting data over multiple servers Domain or functional decomposition
Academy will concentrate on domain model
Partitioning
8/12/2019 Big Data Analytics Presentation
7/34
By design Search engine
By evolution
Corporate acquisitions Our example!
Domain Decomposition
8/12/2019 Big Data Analytics Presentation
8/34
Considerations Minimize communication
Compare to ECP
Server architecture
High availability requirements
Optimal number of threads
Task distribution
8/12/2019 Big Data Analytics Presentation
9/34
Split and delegate task - Map Aggregate partial results Reduce
Result has same format 1 to N
Aggregation should not be bottleneck
MapReduce Pattern
8/12/2019 Big Data Analytics Presentation
10/34
How much set up is required? Despite what you may read about other technologies,
development work is necessary for allimplementations
Do I need to install additional software?
No! "
MapReduce Questions
8/12/2019 Big Data Analytics Presentation
11/34
Multiple web shops Regional warehouses
Europe, Asia, Americas
Big Web Shop
Outsources all orders to web shops
Academy Scenario
8/12/2019 Big Data Analytics Presentation
12/34
Big shop category managers want to knowabout any of their products being frequentlyout-of-stock
Measure of unhappiness Product is out of stock at time of order too often
Product will still be delivered but might be late
The Problem
8/12/2019 Big Data Analytics Presentation
13/34
Web shop simulator Business service
Big web shop order distribution
Business process and business operation
Warehouses
Data model and pivot table
Web service
Initial Infrastructure
8/12/2019 Big Data Analytics Presentation
14/34
HoleFoods Web Shop
8/12/2019 Big Data Analytics Presentation
15/34
HoleFoods Data Model
Outlet
Population
Country
City
Country
Name
Region
Product
Name Region
Name
Type
Transaction
Actual
Date Of Sale
Product
Outlet
Channel
AmountOfSale
Units Sold
InStock
Category
Price
SKU
8/12/2019 Big Data Analytics Presentation
16/34
DeepSee Data Model
Cubes Defines dimensions andmeasures
Subject Areas
Views on cubes Provides automatic filtering
KPIs
Makes more sophisticatedcomputations available todashboards
Can make use of DeepSee, SQL,
or custom logic
8/12/2019 Big Data Analytics Presentation
17/34
DeepSee Performance and Scalability
Multi-level, incremental caching tosupport large data models (100M+facts)
Support for parallel execution ofqueries to exploit multi-corearchitectures: Queries are split by # of facts Queries are split by # of cells Subqueries and joins
Logic for updates to Data Model isstreamlined
8/12/2019 Big Data Analytics Presentation
18/34
Academy setup
BigDataAsia
Europe
Americas
OrderDistributor
Web shopSimulator
Four Ensemble instances:
8/12/2019 Big Data Analytics Presentation
19/34
In this exercise you will familiarize yourselfwith a regional warehouse (DeepSee) and usethe web shop simulator.
Exercise 1
8/12/2019 Big Data Analytics Presentation
20/34
MDX
MDX (MultiDimensional eXpressions) standard query languagefor OLAP (online analytical processing)
Provides standard syntax to execute queries against a cube
When you create a pivot table DeepSee generates and uses anMDX query, which you can view directly
Analyzer provides an option for directly running MDX queries
You can run MDX queries in the DeepSee shell
DeepSee provides an API that you can use to run MDX querieson your DeepSee cubes
8/12/2019 Big Data Analytics Presentation
21/34
MDX Example
SELECT NON EMPTY [OUTLET].%TOPMEMBERS ON 0,NONEMPTY [CHANNEL].%TOPMEMBERS ON 1 FROM [SALES]
WHERE [MEASURES].[AMOUNT SOLD]
8/12/2019 Big Data Analytics Presentation
22/34
In this exercise you will access yourwarehouse analytics programatically, usingMDX, and publish the results as a web service.
Exercise 2
8/12/2019 Big Data Analytics Presentation
23/34
Ens.CallStructure Holds a request object and a target name
Also has a slot for the Response
Ens.Host.SendRequestSyncMultiple
Accepts a list of Ens.CallStructure
Makes calls in parallel
Adds response objects to Ens.CallStructure
How to parallelize dynamically
8/12/2019 Big Data Analytics Presentation
24/34
set
tCall
= ##class(Ens.CallStructure).%New()set
tCall.TargetDispatchName
= MyBusinessHostClass"set
tCall.Request
= ##class(MyRequestClass).%New()
setpRequestList= pRequestList+ 1
set
pRequestList(pRequestList) = tCall
settSC= ..SendRequestSyncMultiple(.tRequestList)
How to parallelize dynamically
8/12/2019 Big Data Analytics Presentation
25/34
In this exercise you will retrieve statistics fromthe relevant regional warehouses, usingparallel calls.
Exercise 3
8/12/2019 Big Data Analytics Presentation
26/34
Dashboards
Widgets
8/12/2019 Big Data Analytics Presentation
27/34
In this exercise you will aggregate the resultsfrom Exercise 3 and monitor the aggregatedresults using a dashboard.
Exercise 4
8/12/2019 Big Data Analytics Presentation
28/34
Warehouse problem simulator
Business Rule
Creates decision in point in business process
Change at runtime
8/12/2019 Big Data Analytics Presentation
29/34
In this exercise you will force a productcategory to be out-of-stock and watch theresults deteriorate
Exercise 5
8/12/2019 Big Data Analytics Presentation
30/34
With InterSystems technology: When does big data become big data?
When distributing data:
DeepSee (and perhaps iKnow) on the nodes
ECP useful for maintaining code
Conclusion
8/12/2019 Big Data Analytics Presentation
31/34
Questions?
Thank you
8/12/2019 Big Data Analytics Presentation
32/34
Developer Connection
developer.intersystems.com
Your Global Summit Every Day
8/12/2019 Big Data Analytics Presentation
33/34
We want your feedback
Wed love your feedback on the academyyou just attended. Go to:
intersystems.com/survey
Select the date, time, and academy you attendedand complete the short evaluation form.
Thank you
8/12/2019 Big Data Analytics Presentation
34/34
Big Data Analytics
Otto Medin & Louise ParberrySales Engineers