Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | stefan-urbanek |
View: | 550 times |
Download: | 4 times |
Logical Physical
Cubes
Cubes – light weight pluggable data warehouse
Slicing and Dicing
Overview
Cell as user interface element: multi-dimensional breadcrumbs
PointCut
RangeCut
SetCut
✂ cut1 = PointCut(9 “status”, [“open”])
✂ cut2 = SetCut(9 “region”, [[“sk“, “ba”], [“hu”]])
✂ cut3 = RangeCut(9 “date”, [2010, 1], [2012])
filter UI
for table cell or link label
cell[level.label_attribute]
Summary
Drill-down
result.cells
for cell in result: print “%s: %s“ % (cell[level.label_attribute], cell[aggregate.name])
browser = workspace.browser(“contracts”)result = browser.aggregate(o cell, . drilldown=[9 “sector”])
∑ price
facts
price_sum =
price
facts
aggregate:
measure
physical data store(database or API)
|Browser
"Store
∑aggregate
model
cell[level.key]
for URL
backend-specificmight hold a database cursor
iterable
cell[“price_sum”]
aggregate
SQL or other backend-specific query is generated
result.summaryAggregationResult
– browsing of aggregated data– multi-dimensional data modeling– unified interface for analytical data
Workspace
Server – easy to use JSON over HTTP API– can be integrated as Flask Blueprint
Synopsis: Python framework and set of tools for building a heterogenous pluggable data warehouse, multidimensional data access and online analytical processing (OLAP) of categorical data. Light weight.
source data #Server
$Authenticator
%Workspace
&Authorizer
|Browser
"Store
analytical data
|User Interface
Model
Metadata – Logical description of data: cubes, dimensions, measures and aggregates.Cubes – logical data structure, collection of measurable facts (invoices, phone calls, events, …)Dimensions – provide context for facts, used to filter queries or reports and control scope of aggregation of facts. Might contain concept hierarchies such as category–subcategory–product or date hierarchy (year-month-day or year–quarter–month-day) or geographical hierarchies.Model also provides information about mapping to the physical data store.
"cubes": [ { "name": "sales”, “measures”: [“amount”], "dimensions": [“date”, “product", "store"] "joins": [ {"master”:”date_id”, "detail”:”date.id"}, {"master":"product_id", "detail":"product.id"}, {"master":"store_id", "detail":"store.id"} ] }],"dimensions": [ { "name": "product", "attributes": ["code", "name"] }, { "name": "store", "attributes": ["code", "address"] }]
product_id
amount
store_id
sale_date
sales
name
code
id
product
store
id
code
address
month
year
date
Physical model
Denormalized
Browsing and Aggregation
Get aggregated data for a cell, get dimension members within a cell or get lit of facts within a cell, if available. Uses backend-specific aggregation or interface to another aggregation engine
Drill-down – Get more details – by year, by produce, by store, …
workspace = Workspace()workspace.import_model(“model.json”)workspace.register_default_store(“sql”, “postgres://localhost/data”)
Model Providers
Stores
sales churn eventsactivations
Static Model Provider
API Model Provider
BI Data(Postgres)
BI Data 2(Mongo)
Events(API)
Pluggable Analytical Workspace
Authors: Robin Thomas and Stefan Urbanek
Cell – Provides context of interest, composed of cuts. There are three kinds of cuts: point, set and range. Cuts can be also inverted using invert=True, which will yield cells outside of the cut.
point cut – single dimension member
set cut – multiple dimensions members
range cut – members between two values of an ordered dimension (such as date)
ocell = Cell([cut1, cut2, cut3])
dimension path
dimension paths
dimension from
to
o cell = Cell(cube)browser.aggregate(o cell)
browser.aggregate(o cell, drilldown=[9 “date”])
✂ cut = PointCut(9 “date”, [2010])o cell = o cell.slice(✂ cut)
browser.aggregate(o cell, drilldown=[9 “date”])
result.cells result.cellsresult.summary
Manages the model, data stores and model providers.
SQL schema example:
Take your Google Analytics, and your SQL database, and you have a single way for your users to access all of them. No need to grant account access for everyone to each particular datasource.
Slicer
collect other (external) Cubes Slicers
[store_data]type: sqlurl: postgres://localhost/data
[store_data2]type: mongohost: localhost
[store_events]type: mixpanelapi_key: 123456api_secret: 123456
Stores configuration:
Supported Backends:
store
Slicer server
Web ApplicationHTML+JS, RoR, …
HTTP request
JSON reply
model
JSON reply
CubesPython API
Django, Flask, …
HTML
model
store
Flask
HTML
Slicer Blueprint
model
{ "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, “date.year": 2009 }, { "record_count": 31, "amount_sum": 566020, “date.year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 }}
generic visualizers and reporting applications specific purpose reports
SQL Backend
subject dimension supplier dimension geography dim. date dim.subject
contract
supplier supplier type
city regiondate
subject category
!Store
|Browser
#Model Provider
Slicer Server VisualizersUnified aggregation interface to variety of data stores and services.JSON interface over HTTP. Built using Flask web micro-framework. Can be used as a stand-alone server or integrated in another application and serve as an analytical module.
Ways of Deployment Authentication and Authorization
BackendsBring your own aggregation engine. Take your Google Analytics, and your SQL database, and you have a single way for your users to access all of them. No need to grant account access for everyone to each particular datasource.
Quick ways of creating an analytical data server or adding an analytical module into your application or on top of your system.
/facts – get list of facts within a cell (if available)/members – list dimension members within a cell/cell – get multi-dimensional breadcrumbs information,
browsing context or “where am I looking at?”
Backend modules:
Model Provider – A live cubes concept mapper: maps foreign cubes or foreign cube-like structures into Cubes model.
Aggregation Browser – provides the core functionality of aggregation or delegates the aggregation to an external aggregator.
Store – manages access to the data, establishes and maintains database connections, generates appropriate external API calls (“pretends to be a store”).
model
OR
Built-in backend for ROLAP (Relational OLAP)Features:■ star and snowflake schema support■ joins are executed only if needed for a given query■mapping of DATE data type without the date dimension table■ simple support of non-additive/semi-additive dimensions and
aggregates■ “split” cell dimension – mark cells as within or outside of a split cell■ support for outer-joins
★ ❄ORdenormalization
Turning JSON data into reports, charts, tables. It is very easy to build custom visualization on top of the Cubes analytical data with framework of yor choice.
Slicer
Cubes
Cell(point of view)
facts(details)
∑ aggregates
model
List cubes:
Aggregate:
Get cube model (metadata):
GET /cubes
GET /cube/sales/model
GET /cube/sales/aggregate?
➊ Python web framework using Cubes python module ➋ Plug-in for Flask application ➌ Stand-alone server with HTML+JS front-end or stad-alone server with external application.
Slicer JSON response
[uwsgi]socket = 127.0.0.1:5000module = cubes.server.appcallable = application
from cubes.server import slicer
app = Flask(__name__)app.register_blueprint(slicer, url_prefix="/slicer")
bash$ slicer serve slicer.ini
➊ ➋ ➌
Serving with the slicer tool: Simple deployment with UWSGI:
Flask blueprint integration:
Results can be paginated page=, ordered with order= and also formatted as CSV or newline separated JSON records using format=.
Use either a generic visualizer and reporting application such as Cubes Visualizer or Cubes Viewer or create one that suits your reporting needs.
Authorization – Manage access to the cubes or part of a cube using access rights. User might have a right only to al imited set of cubes or might have access to a particular cell in the cube. For example engineers might not have access to the financial cube and stores might have access to financials only for their store.
Authentication – Server-side, plug-in based action that based on user’s credentials or any other relevant information, provides a user identity which is passed to the workspace.There are two built-in atuhenticators: pass_parameter: pass identity as a URL parameter, permissive method and http_basic_proxy: permissive authentication using HTTP Basic method.
Built-in authorizer uses a JSON rights configuration file:{ “lidia”: { “allowed_cubes”: [“sales”], “cube_restrictions”: { “sales”: [“store:3”] } }, “martin”: { “allowed_cubes”: [“sales”], “cube_restrictions”: { “sales”: [“store:5”] } }}
class CustomAuthorizer(Authorizer):
def authorize(self, cubes): … authorize with a database … return authorized_cubes
def restricted_cell(self, identity, cube, cell): # Restriction with ‘user identity’ dimension cut = PointCut(“users”, [identity]) restriction = Cell(cube, [cut])
if cell: return cell & restriction else: return restriction
Custom authorizer:
data brewery.org$ http://cubes.databrewery.org% https://github.com/stiivi/cubes& #databrewery at irc.freenode.netPublished for PyCon, April 2014, based on Cubes v1.0
Model
required
automatic
automatic
automatic
required
Slicer cube dimension
key/attribute
property
column (table)
Dimensions
dimension
Cubes / Facts
metric
table
collection
event
Google Analytics
Mixpanel
MongoDB
SQL
Backend
Cubes concepts:
cut=date:2010& split=status:1&drilldown=date,region& page=10 page_size=100&