Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 227 times |
Download: | 2 times |
BI Sematic Model
Albert van DokSQL Zaterdag12 november 2011
Agenda
BackgroundLife Before BISMWhat is BISMBISM PositioningQuestions
Background
From data towards informationBy nature the demand for (new) information and insights will always evolveTo connect and integrate (new) datasources is an essential partPreparing data for use
Data cleansingDefine relationshipsData enrichmentAdd calculationsVersioning
Goal is not always easily to achieve
Applications
• Analytical solutions• Operational reports• Dashboards &
Scorecards• Data Mining
Require-
ments• Quick delivery• Integration of data by
business user• Ad hoc reports• Excellent
performance• Flexible
Issues
• Operational reports from an analytical system
• Wrong use of tools or BI tools not flexible
• (Performance) problems• Long implementation
times• Highly depended on IT
BI across the enterprise
Life before BISM
DW
Datamart
Datamart
Data Model
Reporting Tool
Reporting Tool
ToolData Source
MOLAP
MOLAP OLAP Browser
OLAP Browser
Reporting Tool
OLTP
Life before BISM
DW
Datamart
Datamart
Data Model
Reporting Tool
Reporting Tool
ToolData Source
MOLAP
MOLAP OLAP Browser
OLAP Browser
Reporting Tool
OLTP
UDM
Life before BISM
DW
Datamart
Datamart
Data Model
Reporting Tool
Reporting Tool
ToolData Source
OLAP Browser
OLAP Browser
Analysis Services
Reporting Tool
MOLAP
MOLAP
OLTP
UDM
XM
L/A
Cache
Security
End-user model• Transalations• Actions• KPI…
Calculations
Basic dim. model• Cube &
Dimensions• Storage &
Caching policies• Linked Objects
Datasource view
UDM
The UDM in SSAS 2008 R2
UDM
Excel 2010
Reporting Services 2008 R2
&Report Builder 3
SharePoint 2010• Excel Services• PerformancePoint
Services• Visio Services
3rd party SSAS clients
MDX MDX
MDX
MDX
Besides the advantages the UDM:
Is often too complex for simple reporting purposesHas a steep learning curveUses MDX which is different than SQL…Must be implemented by a BI professionalNeeds small investment just to start
The holy grail: Self Service BI
New paradigm“Business intelligence for the masses”“Managed self-service business intelligence”
Put simple, powerful BI tools in the hands of “knowledge workers”
Familiar tools: ExcelPeople who own the data
Excel spreadsheet, Access database or SharePoint list data
Reality: Office power users
New kid on the block: PowerpivotPowerpivot for Excel
Free Addin for ExcelRunning 32/64bit and lots of RAM… Contains Vertipaq engine (SSAS running in process with Excel)
Powerpivot for SharepointComes with SQL Server 2008 R2 x64Sharepoint 2010 extentionVertipaq running on server sideFor sharing and managing PowerPivot applications
Powerpivot
PowerPivot has its own semantic model which can be seen as BISM v1
enables connecting data from various data sourcesadd relations between tablesadd calculations, two places:
in tables – calculated columns (DAX)over the whole model – calculated measures (DAX)
works in cached (VertiPaq) mode
Covers personal and team BI segments
What is Vertipaq
In-memory column-based database
Very high data compression
Doesn’t require the
process of designing and building aggregations and other tunningSupport partitioning and paging on large data sizes
Relational Database
15
4 Jim … $1,500 5 Liz … $0 6 Dave … $9,000
7 Sue … $1010 8 Bob … $50 9 Jim … $1,300
1 Bob … $3000 2 Sue … $500 3 Ann … $1,700Page 1
Page 2
Page 3
64 bytes
CPU
L2 Cache
L1 Cache
Memory (DBMS
Buffer Pool)1 … $3000
2 … $500 3 … $1700
4 … $1500
5 … $0 6 … $9000
.. $3000 .. $500 .. $1700 .. $1500.. $0 .. $9000 .. $1010 .. $50 .. $1300
7 … $1010
8 … $50 9 … $1300
.. $3000 .. $500.. $1700 .. $1500.. $0 .. $9000.. $1010 .. $50 .. $1300
8K bytes
64 bytes
Select id, name, BalDue from Customers where BalDue > $500
Query summary:• 3 pages read from disk• Up to 9 L1 and L2 cache misses
(one per tuple)
Don’t forget that:- An L2 cache miss can stall the CPU for up to 200 cycles
Columnstore Database
16
64 bytes
CPU
L2 Cache
L1 Cache
Memory
8K bytes
64 bytes
Id 1 2 3 4 5 6 7 8 9
Name Bob Sue Ann Jim Liz Dave Sue Bob Jim
BalDue 9000 1010 50 1300
3000 500 1700 1500 0
Street … … … … …..… … … … …..… … … … …..… … … … …..
9000 1010 50 1300
3000 500 1700 1500 0
3000 500 1700
3000 500 1700
1500 0
1500 0
9000 1010 50
9000 1010 50 1300
1300
Takeaways:• Each cache miss brings only
useful data into the cache• Processor stalls reduced by up to
a factor of: 8 (if BalDue values are 8 bytes)16 (if BalDue values are 4 bytes)
Caveats:• Not to scale! An 8K byte page of
BalDue values will hold 1000 values (not 5)
• Not showing disk I/Os required to read id and Name columns
Select id, name, BalDue from Customers where BalDue > $500
An example
Assume:Customer table has 10M rows, 200 bytes/row (2GB total size)Id and BalDue values are each 4 bytes long, Name is 20 bytes
Query:Select id, Name, BalDue from Customer where BalDue > $1000
Row store execution: Scan 10M rows (2GB) @ 80MB/sec = 25 sec.
Column store execution:Scan 3 columns, each with 10M entries 280MB@80MB/sec = 3.5 sec.
(id 40MB, Name 200MB, BalDue 40MB)
About a 7X performance improvement for this query!! But we can do even better using compression
Demo
Powerpivot
We are not there yet
Although Powerpivot for Excel is great, it has certain limitations
Limit to 2Gb, no support for partitions, queries Vertipaq cache, daily scheduled data refresh in Sharepoint, acces to workbook
PowerPivot and Analysis Services are two different products hence two models
Powerpivot targets business users, model managed in ExcelAnalysis Services targets BI professionals and IT, model managed on the server
“Can we have one model which integrate both worlds and seamlessly transition BI applications from Personal BI to Team BI to Organizational/Professional BI?”
And now there is BISM…
What is coming in Denali
BISM v2One model for all
reporting, analysis, dashboards, scorecardspersonal, team, corporate BI
Has a relational and multidimensional APISupport both cached (Molap & VertiPaq) and the pass-through (realtime) mode
only SQL Server data sources for now
Pass-throughno additional databasedata stays as is in the original structuresideal for the realtime analysis
Why does this work
In “Denali” every cube automatically becomes a BI Semantic Model
To create a BI semantic model you create a:multidimensional model, tabular model, PowerPivot workbook
Every model looks like cubes/dimensions/measure groups/data sources/data source views under the covers
they share a common Analysis Services file format.this shared underlying structure that makes the BI semantic model work
BISM Data modelHybrid model supporting multidimensional and tabular data models
Developed using an multidimensional or a tabular projectChoice depends on application needs and skillset
TabularFamiliar model, easier to build, faster time to solutionNot all advanced concepts (e.g. many-to-many) not available natively in the model… need calculations to simulate theseEasy to wrap a model over a raw database or warehouse for reporting & analytics
MultidimensionalSophisticated model, higher learning curveAdvanced concepts baked into the model and optimized (parent-child, many-to-many, attribute relationships, key vs. name, etc.)Ideally suited for OLAP type apps (e.g. planning, budgeting, forecasting) that need the power of the multidimensional model
BISM Business Logic & Queries
Represents the intelligence or semantics in the modelDefines entities and relations between themUser-orientedDAX
Based on Excel formulas and relational concepts – easy to get startedComplex solutions require steeper learning curve – row/filter context, Calculate, etcCalculated columns enable new scenarios, however no named sets or calc members
MDXBased on understanding of multidimensional concepts – higher initial learning curveComplex solutions require steeper learning curve – CurrentMember, overwrite semantics, etc.Ideally suited for apps that need the power of multidimensional calculations – scopes, assignments, calc members
BISM Data Access
This layer integrates data from multiple sources – relational databases, business applications, flat files, OData feeds, etc.
Two modes: cached and pass-throughCached:: pulls in data from all the sources and stores it in a compressed data structure
MOLAP and VertiPaq
Passthrough: pushes query processing and business logic down to the data source
ROLAP and DirectQuery
Analysis Services ‘Denali’ - UDM
UDM
Excel 2010
Reporting Services „Denali”
SharePoint 2010•Excel Services•Reporting Services•PerformancePoint Services
•Visio Services
3rd party SSAS clients
SharePoint 2010•Power View
MDX MDX
MDX
MDX
MDX?
Analysis Services ‘Denali’ - BISM
BISM
Excel 2010
Reporting Services „Denali”
SharePoint 2010•Excel Services•Reporting Services•PerformancePoint Services
•Visio Services
3rd party SSAS clients
SharePoint 2010•Power View
3rd party SSAS clients
MDX MDX
MDX
MDX
DAX
DAX?
DAX
Powerpivot workbook
BISM
Excel 2010
Delali’s new features in BISM
BISM in ‘Denali’ includes:hierarchies, KPIs, parent-child, drillthrough, perspectivesadditional DAX functions (RankX, DistinctCount, GroupBy, Lookup)security (role-based with Active Directory, column/row based)
BISM does not include:some of the UDM features
scripts, actions, translations, role-playing dimensionsobject modelwrite-back
otherrealtime for non-SQL Server data sourcesMDX query support for realtime
Demo
BISM and the tabular model
Advantages of BISM
Relatively simple modelFast responseFlexibleDAX calculations are similar to Excel formulasMore understandable and user-friendly to majority of peopleSame model across all scenarios
Easily scale from personal BI to corporate BIFaster development than in UDMPrototyping by end-usersEasier changes of modelReduction of cost in developing the full BI solution
Positioning of BISMMOLAP is much more complex than PowerPivot, but it offers greater scalability
ROLAP is even more limited, but it scales above 50TB space
PowerPivot models can grow up to 2GB which is the limit set by SharePoint if they want to be shared among others. Otherwise, only the memory is the limit
BISM comes in the middle and fills the space between MOLAP and PowerPivot
For the space way above the 50TB there are new ColumnStore indexes (in the relational engine)
MOLAP
PowerPivot
BISM
RO
LAP
ColumnStore
source: Thomas Kejser, SQLCAT Usability
Sca
labi
lity
50 TB
5 TB
100 Gb
2 Gb
Current Limitations in “Denali”
Two projects for building a BI Semantic Model
Future plan is to integrate these into 1 model
Use Vertipaq as an SSAS storageUse MDX scripts in tabular projects
DAX queries are not supported in multidimensional projects
and thereby Power Viewer, which uses DAX to retrieve data from the model
Analysis Services Architecture
Beyond Denali
BI Semantic Model featuresRole playing dimensionsTranslationsActionsMDX ScriptsRealtime over Oracle, Teradata, DB2…
ProgrammabilityBISM object modelMDX query support for RealtimeWrite back
Wrapup
BISM is not a replacement for UDMDAX is not a replacement for MDXColumn store databases offering blazing fast performanceEvery model has its advantagesBI architects must decide when to apply which modelBISM v2 in not complete, expect changes!