+ All Categories
Home > Documents > Big Data Analytics Presentation

Big Data Analytics Presentation

Date post: 03-Jun-2018
Category:
Upload: razvan-julian-petrescu
View: 216 times
Download: 0 times
Share this document with a friend

of 34

Transcript
  • 8/12/2019 Big Data Analytics Presentation

    1/34

    Big Data Analytics

    Otto Medin & Louise ParberrySales Engineers

  • 8/12/2019 Big Data Analytics Presentation

    2/34

    How big is big? Response time requirements

    Scalability requirements

    Budget

    Big Data Analytics Overview

  • 8/12/2019 Big Data Analytics Presentation

    3/34

    Big Data By The Numbers Data load limit ~400 MB/sec (commodity server)

    3 terabyte data load

    $180 hard drive ~7860 sec (~2 !hrs)

    1 exabyte

    $63 million

    87 years

    Big Data Analytics

  • 8/12/2019 Big Data Analytics Presentation

    4/34

    The key to Big Data Analytics:

    PARALLELIZE!(if you want a quick result, that is)

    Big Data Analytics

  • 8/12/2019 Big Data Analytics Presentation

    5/34

    Big Data Analytics Overview Parallelization

    Academy Model

    Exercises

    Agenda

  • 8/12/2019 Big Data Analytics Presentation

    6/34

    Splitting data over multiple servers Domain or functional decomposition

    Academy will concentrate on domain model

    Partitioning

  • 8/12/2019 Big Data Analytics Presentation

    7/34

    By design Search engine

    By evolution

    Corporate acquisitions Our example!

    Domain Decomposition

  • 8/12/2019 Big Data Analytics Presentation

    8/34

    Considerations Minimize communication

    Compare to ECP

    Server architecture

    High availability requirements

    Optimal number of threads

    Task distribution

  • 8/12/2019 Big Data Analytics Presentation

    9/34

    Split and delegate task - Map Aggregate partial results Reduce

    Result has same format 1 to N

    Aggregation should not be bottleneck

    MapReduce Pattern

  • 8/12/2019 Big Data Analytics Presentation

    10/34

    How much set up is required? Despite what you may read about other technologies,

    development work is necessary for allimplementations

    Do I need to install additional software?

    No! "

    MapReduce Questions

  • 8/12/2019 Big Data Analytics Presentation

    11/34

    Multiple web shops Regional warehouses

    Europe, Asia, Americas

    Big Web Shop

    Outsources all orders to web shops

    Academy Scenario

  • 8/12/2019 Big Data Analytics Presentation

    12/34

    Big shop category managers want to knowabout any of their products being frequentlyout-of-stock

    Measure of unhappiness Product is out of stock at time of order too often

    Product will still be delivered but might be late

    The Problem

  • 8/12/2019 Big Data Analytics Presentation

    13/34

    Web shop simulator Business service

    Big web shop order distribution

    Business process and business operation

    Warehouses

    Data model and pivot table

    Web service

    Initial Infrastructure

  • 8/12/2019 Big Data Analytics Presentation

    14/34

    HoleFoods Web Shop

  • 8/12/2019 Big Data Analytics Presentation

    15/34

    HoleFoods Data Model

    Outlet

    Population

    Country

    City

    Country

    Name

    Region

    Product

    Name Region

    Name

    Type

    Transaction

    Actual

    Date Of Sale

    Product

    Outlet

    Channel

    AmountOfSale

    Units Sold

    InStock

    Category

    Price

    SKU

  • 8/12/2019 Big Data Analytics Presentation

    16/34

    DeepSee Data Model

    Cubes Defines dimensions andmeasures

    Subject Areas

    Views on cubes Provides automatic filtering

    KPIs

    Makes more sophisticatedcomputations available todashboards

    Can make use of DeepSee, SQL,

    or custom logic

  • 8/12/2019 Big Data Analytics Presentation

    17/34

    DeepSee Performance and Scalability

    Multi-level, incremental caching tosupport large data models (100M+facts)

    Support for parallel execution ofqueries to exploit multi-corearchitectures: Queries are split by # of facts Queries are split by # of cells Subqueries and joins

    Logic for updates to Data Model isstreamlined

  • 8/12/2019 Big Data Analytics Presentation

    18/34

    Academy setup

    BigDataAsia

    Europe

    Americas

    OrderDistributor

    Web shopSimulator

    Four Ensemble instances:

  • 8/12/2019 Big Data Analytics Presentation

    19/34

    In this exercise you will familiarize yourselfwith a regional warehouse (DeepSee) and usethe web shop simulator.

    Exercise 1

  • 8/12/2019 Big Data Analytics Presentation

    20/34

    MDX

    MDX (MultiDimensional eXpressions) standard query languagefor OLAP (online analytical processing)

    Provides standard syntax to execute queries against a cube

    When you create a pivot table DeepSee generates and uses anMDX query, which you can view directly

    Analyzer provides an option for directly running MDX queries

    You can run MDX queries in the DeepSee shell

    DeepSee provides an API that you can use to run MDX querieson your DeepSee cubes

  • 8/12/2019 Big Data Analytics Presentation

    21/34

    MDX Example

    SELECT NON EMPTY [OUTLET].%TOPMEMBERS ON 0,NONEMPTY [CHANNEL].%TOPMEMBERS ON 1 FROM [SALES]

    WHERE [MEASURES].[AMOUNT SOLD]

  • 8/12/2019 Big Data Analytics Presentation

    22/34

    In this exercise you will access yourwarehouse analytics programatically, usingMDX, and publish the results as a web service.

    Exercise 2

  • 8/12/2019 Big Data Analytics Presentation

    23/34

    Ens.CallStructure Holds a request object and a target name

    Also has a slot for the Response

    Ens.Host.SendRequestSyncMultiple

    Accepts a list of Ens.CallStructure

    Makes calls in parallel

    Adds response objects to Ens.CallStructure

    How to parallelize dynamically

  • 8/12/2019 Big Data Analytics Presentation

    24/34

    set

    tCall

    = ##class(Ens.CallStructure).%New()set

    tCall.TargetDispatchName

    = MyBusinessHostClass"set

    tCall.Request

    = ##class(MyRequestClass).%New()

    setpRequestList= pRequestList+ 1

    set

    pRequestList(pRequestList) = tCall

    settSC= ..SendRequestSyncMultiple(.tRequestList)

    How to parallelize dynamically

  • 8/12/2019 Big Data Analytics Presentation

    25/34

    In this exercise you will retrieve statistics fromthe relevant regional warehouses, usingparallel calls.

    Exercise 3

  • 8/12/2019 Big Data Analytics Presentation

    26/34

    Dashboards

    Widgets

  • 8/12/2019 Big Data Analytics Presentation

    27/34

    In this exercise you will aggregate the resultsfrom Exercise 3 and monitor the aggregatedresults using a dashboard.

    Exercise 4

  • 8/12/2019 Big Data Analytics Presentation

    28/34

    Warehouse problem simulator

    Business Rule

    Creates decision in point in business process

    Change at runtime

  • 8/12/2019 Big Data Analytics Presentation

    29/34

    In this exercise you will force a productcategory to be out-of-stock and watch theresults deteriorate

    Exercise 5

  • 8/12/2019 Big Data Analytics Presentation

    30/34

    With InterSystems technology: When does big data become big data?

    When distributing data:

    DeepSee (and perhaps iKnow) on the nodes

    ECP useful for maintaining code

    Conclusion

  • 8/12/2019 Big Data Analytics Presentation

    31/34

    Questions?

    Thank you

  • 8/12/2019 Big Data Analytics Presentation

    32/34

    Developer Connection

    developer.intersystems.com

    Your Global Summit Every Day

  • 8/12/2019 Big Data Analytics Presentation

    33/34

    We want your feedback

    Wed love your feedback on the academyyou just attended. Go to:

    intersystems.com/survey

    Select the date, time, and academy you attendedand complete the short evaluation form.

    Thank you

  • 8/12/2019 Big Data Analytics Presentation

    34/34

    Big Data Analytics

    Otto Medin & Louise ParberrySales Engineers


Recommended