+ All Categories
Home > Documents > Cubes Poster – PyCon 2014

Cubes Poster – PyCon 2014

Date post: 29-Dec-2015
Category:
Upload: stefan-urbanek
View: 550 times
Download: 4 times
Share this document with a friend
Description:
Cubes – light weight pluggable data warehouse. Python framework and set of tools for building heterogenous pluggable data warehouse, multidimensional data access and online analytical processing (OLAP) of categorical data. Light weight.
Popular Tags:
2
Logical Physical Cubes Cubes – light weight pluggable data warehouse Slicing and Dicing Overview Cell as user interface element: multi-dimensional breadcrumbs PointCut RangeCut SetCut cut 1 = PointCut(9 “status”, [“open”]) cut 2 = SetCut(9 “region”, [[“sk“, “ba”], [“hu”]]) cut 3 = RangeCut(9 “date”, [2010, 1], [2012]) lter UI for table cell or link label cell[level.label_attribute] Summary Drill-down result.cells for cell in result: print “%s: %s“ % (cell[level.label_attribute], cell[aggregate.name]) browser = workspace.browser(“contracts”) result = browser.aggregate(o cell, . drilldown=[9 “sector”]) price facts price_sum = pri ce facts aggregate: measure physical data store (database or API) | Browser Store aggregate model cell[level.key] for URL backend-specic might hold a database cursor iterable cell[“price_sum”] aggregate SQL or other backend-specic query is generated result.summary AggregationResult – browsing of aggregated data – multi-dimensional data modeling – unied interface for analytical data Workspace Server – easy to use JSON over HTTP API – can be integrated as Flask Blueprint Synopsis: Python framework and set of tools for building a heterogenous pluggable data warehouse, multidimensional data access and online analytical processing (OLAP) of categorical data. Light weight. source data Server Authenticator Workspace Authorizer | Browser Store analytical data | User Interface Model Metadata – Logical description of data: cubes, dimensions, measures and aggregates. Cubes – logical data structure, collection of measurable facts (invoices, phone calls, events, …) Dimensions – provide context for facts, used to lter queries or reports and control scope of aggregation of facts. Might contain concept hierarchies such as category–subcategory–product or date hierarchy (year-month-day or year–quarter–month-day) or geographical hierarchies. Model also provides information about mapping to the physical data store. "cubes": [ { "name": "sales”, measures”: [“amount”], "dimensions": [“date”, “product", "store"] "joins": [ {"master”:”date_id”, "detail”:”date.id"}, {"master":"product_id", "detail":"product.id"}, {"master":"store_id", "detail":"store.id"} ] } ], "dimensions": [ { "name": "product", "attributes": ["code", "name"] }, { "name": "store", "attributes": ["code", "address"] } ] product_id amount store_id sale_date sales name code id product store id code address month year date Physical model Denormalized Browsing and Aggregation Get aggregated data for a cell, get dimension members within a cell or get lit of facts within a cell, if available. Uses backend-specic aggregation or interface to another aggregation engine Drill-down – Get more details – by year, by produce, by store, … workspace = Workspace() workspace.import_model(“model.json”) workspace.register_default_store(“sql”, “postgres://localhost/data”) Model Providers Stores sales churn events activations Static Model Provider API Model Provider BI Data (Postgres) BI Data 2 (Mongo) Events (API) Pluggable Analytical Workspace Authors: Robin Thomas and Stefan Urbanek Cell – Provides context of interest, composed of cuts. There are three kinds of cuts: point, set and range. Cuts can be also inverted using invert=True, which will yield cells outside of the cut. point cut – single dimension member set cut – multiple dimensions members range cut – members between two values of an ordered dimension (such as date) ocell = Cell([cut 1 , cut 2 , cut 3 ]) dimension path dimension paths dimension from to o cell = Cell(cube) browser.aggregate(o cell) browser.aggregate(o cell, drilldown=[9 “date”]) cut = PointCut(9 “date”, [2010]) o cell = o cell.slice(cut) browser.aggregate(o cell, drilldown=[9 “date”]) result.cells result.cells result.summary Manages the model, data stores and model providers. SQL schema example: Take your Google Analytics, and your SQL database, and you have a single way for your users to access all of them. No need to grant account access for everyone to each particular datasource. Slicer collect other (external) Cubes Slicers [store_data] type: sql url: postgres://localhost/data [store_data2] type: mongo host: localhost [store_events] type: mixpanel api_key: 123456 api_secret: 123456 Stores conguration: Supported Backends:
Transcript
Page 1: Cubes Poster – PyCon 2014

Logical Physical

Cubes

Cubes – light weight pluggable data warehouse

Slicing and Dicing

Overview

Cell as user interface element: multi-dimensional breadcrumbs

PointCut

RangeCut

SetCut

✂ cut1 = PointCut(9 “status”, [“open”])

✂ cut2 = SetCut(9 “region”, [[“sk“, “ba”], [“hu”]])

✂ cut3 = RangeCut(9 “date”, [2010, 1], [2012])

filter UI

for table cell or link label

cell[level.label_attribute]

Summary

Drill-down

result.cells

for cell in result: print “%s: %s“ % (cell[level.label_attribute], cell[aggregate.name])

browser = workspace.browser(“contracts”)result = browser.aggregate(o cell, . drilldown=[9 “sector”])

∑ price

facts

price_sum =

price

facts

aggregate:

measure

physical data store(database or API)

|Browser

"Store

∑aggregate

model

cell[level.key]

for URL

backend-specificmight hold a database cursor

iterable

cell[“price_sum”]

aggregate

SQL or other backend-specific query is generated

result.summaryAggregationResult

– browsing of aggregated data– multi-dimensional data modeling– unified interface for analytical data

Workspace

Server – easy to use JSON over HTTP API– can be integrated as Flask Blueprint

Synopsis: Python framework and set of tools for building a heterogenous pluggable data warehouse, multidimensional data access and online analytical processing (OLAP) of categorical data. Light weight.

source data #Server

$Authenticator

%Workspace

&Authorizer

|Browser

"Store

analytical data

|User Interface

Model

Metadata – Logical description of data: cubes, dimensions, measures and aggregates.Cubes – logical data structure, collection of measurable facts (invoices, phone calls, events, …)Dimensions – provide context for facts, used to filter queries or reports and control scope of aggregation of facts. Might contain concept hierarchies such as category–subcategory–product or date hierarchy (year-month-day or year–quarter–month-day) or geographical hierarchies.Model also provides information about mapping to the physical data store.

"cubes": [ { "name": "sales”, “measures”: [“amount”], "dimensions": [“date”, “product", "store"] "joins": [ {"master”:”date_id”, "detail”:”date.id"}, {"master":"product_id", "detail":"product.id"}, {"master":"store_id", "detail":"store.id"} ] }],"dimensions": [ { "name": "product", "attributes": ["code", "name"] }, { "name": "store", "attributes": ["code", "address"] }]

product_id

amount

store_id

sale_date

sales

name

code

id

product

store

id

code

address

month

year

date

Physical model

Denormalized

Browsing and Aggregation

Get aggregated data for a cell, get dimension members within a cell or get lit of facts within a cell, if available. Uses backend-specific aggregation or interface to another aggregation engine

Drill-down – Get more details – by year, by produce, by store, …

workspace = Workspace()workspace.import_model(“model.json”)workspace.register_default_store(“sql”, “postgres://localhost/data”)

Model Providers

Stores

sales churn eventsactivations

Static Model Provider

API Model Provider

BI Data(Postgres)

BI Data 2(Mongo)

Events(API)

Pluggable Analytical Workspace

Authors: Robin Thomas and Stefan Urbanek

Cell – Provides context of interest, composed of cuts. There are three kinds of cuts: point, set and range. Cuts can be also inverted using invert=True, which will yield cells outside of the cut.

point cut – single dimension member

set cut – multiple dimensions members

range cut – members between two values of an ordered dimension (such as date)

ocell = Cell([cut1, cut2, cut3])

dimension path

dimension paths

dimension from

to

o cell = Cell(cube)browser.aggregate(o cell)

browser.aggregate(o cell, drilldown=[9 “date”])

✂ cut = PointCut(9 “date”, [2010])o cell = o cell.slice(✂ cut)

browser.aggregate(o cell, drilldown=[9 “date”])

result.cells result.cellsresult.summary

Manages the model, data stores and model providers.

SQL schema example:

Take your Google Analytics, and your SQL database, and you have a single way for your users to access all of them. No need to grant account access for everyone to each particular datasource.

Slicer

collect other (external) Cubes Slicers

[store_data]type: sqlurl: postgres://localhost/data

[store_data2]type: mongohost: localhost

[store_events]type: mixpanelapi_key: 123456api_secret: 123456

Stores configuration:

Supported Backends:

Page 2: Cubes Poster – PyCon 2014

store

Slicer server

Web ApplicationHTML+JS, RoR, …

HTTP request

JSON reply

model

JSON reply

CubesPython API

Django, Flask, …

HTML

model

store

Flask

HTML

Slicer Blueprint

model

{ "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, “date.year": 2009 }, { "record_count": 31, "amount_sum": 566020, “date.year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 }}

generic visualizers and reporting applications specific purpose reports

SQL Backend

subject dimension supplier dimension geography dim. date dim.subject

contract

supplier supplier type

city regiondate

subject category

!Store

|Browser

#Model Provider

Slicer Server VisualizersUnified aggregation interface to variety of data stores and services.JSON interface over HTTP. Built using Flask web micro-framework. Can be used as a stand-alone server or integrated in another application and serve as an analytical module.

Ways of Deployment Authentication and Authorization

BackendsBring your own aggregation engine. Take your Google Analytics, and your SQL database, and you have a single way for your users to access all of them. No need to grant account access for everyone to each particular datasource.

Quick ways of creating an analytical data server or adding an analytical module into your application or on top of your system.

/facts – get list of facts within a cell (if available)/members – list dimension members within a cell/cell – get multi-dimensional breadcrumbs information,

browsing context or “where am I looking at?”

Backend modules:

Model Provider – A live cubes concept mapper: maps foreign cubes or foreign cube-like structures into Cubes model.

Aggregation Browser – provides the core functionality of aggregation or delegates the aggregation to an external aggregator.

Store – manages access to the data, establishes and maintains database connections, generates appropriate external API calls (“pretends to be a store”).

model

OR

Built-in backend for ROLAP (Relational OLAP)Features:■ star and snowflake schema support■ joins are executed only if needed for a given query■mapping of DATE data type without the date dimension table■ simple support of non-additive/semi-additive dimensions and

aggregates■ “split” cell dimension – mark cells as within or outside of a split cell■ support for outer-joins

★ ❄ORdenormalization

Turning JSON data into reports, charts, tables. It is very easy to build custom visualization on top of the Cubes analytical data with framework of yor choice.

Slicer

Cubes

Cell(point of view)

facts(details)

∑ aggregates

model

List cubes:

Aggregate:

Get cube model (metadata):

GET /cubes

GET /cube/sales/model

GET /cube/sales/aggregate?

➊ Python web framework using Cubes python module ➋ Plug-in for Flask application ➌ Stand-alone server with HTML+JS front-end or stad-alone server with external application.

Slicer JSON response

[uwsgi]socket = 127.0.0.1:5000module = cubes.server.appcallable = application

from cubes.server import slicer

app = Flask(__name__)app.register_blueprint(slicer, url_prefix="/slicer")

bash$ slicer serve slicer.ini

➊ ➋ ➌

Serving with the slicer tool: Simple deployment with UWSGI:

Flask blueprint integration:

Results can be paginated page=, ordered with order= and also formatted as CSV or newline separated JSON records using format=.

Use either a generic visualizer and reporting application such as Cubes Visualizer or Cubes Viewer or create one that suits your reporting needs.

Authorization – Manage access to the cubes or part of a cube using access rights. User might have a right only to al imited set of cubes or might have access to a particular cell in the cube. For example engineers might not have access to the financial cube and stores might have access to financials only for their store.

Authentication – Server-side, plug-in based action that based on user’s credentials or any other relevant information, provides a user identity which is passed to the workspace.There are two built-in atuhenticators: pass_parameter: pass identity as a URL parameter, permissive method and http_basic_proxy: permissive authentication using HTTP Basic method.

Built-in authorizer uses a JSON rights configuration file:{ “lidia”: { “allowed_cubes”: [“sales”], “cube_restrictions”: { “sales”: [“store:3”] } }, “martin”: { “allowed_cubes”: [“sales”], “cube_restrictions”: { “sales”: [“store:5”] } }}

class CustomAuthorizer(Authorizer):

def authorize(self, cubes): … authorize with a database … return authorized_cubes

def restricted_cell(self, identity, cube, cell): # Restriction with ‘user identity’ dimension cut = PointCut(“users”, [identity]) restriction = Cell(cube, [cut])

if cell: return cell & restriction else: return restriction

Custom authorizer:

data brewery.org$ http://cubes.databrewery.org% https://github.com/stiivi/cubes& #databrewery at irc.freenode.netPublished for PyCon, April 2014, based on Cubes v1.0

Model

required

automatic

automatic

automatic

required

Slicer cube dimension

key/attribute

property

column (table)

Dimensions

dimension

Cubes / Facts

metric

table

collection

event

Google Analytics

Mixpanel

MongoDB

SQL

Backend

Cubes concepts:

cut=date:2010& split=status:1&drilldown=date,region& page=10 page_size=100&


Recommended