How to Sharpen Your Investigative Analysis
with the New Excel(a PowerPivot intro)
Carmen Mardiros - navabi GmbH
DA Hub 2015
Core component of the Microsoft
self-serve BI stack
Fast and intelligent data modelling for the Excel pro. !Stack also includes PowerQuery (getting and cleaning data), PowerView (reporting) and PowerBI (online report publishing). !
Integrated in Excel In many ways it feels very familiar (especially if you use pivot tables and charts extensively).
It’s FREE Well, as long as you have Excel Professional Plus 2010 or 2013. !Highly recommended: Get the 2013 64-bit version
What is Power Pivot?
How PowerPivot helps you to become
a better analyst
Analytics tools can’t substitute you. But they can help you to become more efficient, unlock your true potential and get the recognition you deserve. !Today is about lots of examples. !
Ready to use formulas
Not enough time to break all formulas down or explain how PowerPivot works in detail, but will explain how and when to use them, what to change about them, how your data must look like for them to work.
Resources to develop your PowerPivot
skills
A few titles to help you build upon what you’ve learned today.
What today is about
After today, pivot tables will never look the
same again. !
So what’s wrong with Excel anyway?
1. Excel can’t handle lots of data…
… PowerPivot handles many millions easily
2. Regular pivot calculated fields are very basic…
… PowerPivot bends the “normal” pivot rules to its will
3. Must re-create formatting every time you add a metric to a regular pivot…
Takes 8 clicks to set the formatting for Transactions and 2 more to change the title. !Remove it from the pivot and add it again? Start all over. !Every. Single. Time
… in PowerPivot you change once and formatting stays the same
Flash intro to Power Query
PowerQuery: Getting multiple CSVs into PowerPivot
Connectors for many databases, Facebook, Salesforce, Hadoop, feeds, Excel files, CSV files etc
and very soon Google Analytics
PowerQuery: Getting multiple CSVs into PowerPivot
PowerQuery has its own language as well as intuitive UI. !
We use formula to get keep only 1 header from our folder of cdv files
PowerQuery: Getting multiple CSVs into PowerPivot
let Source = Folder.Files(“C:\Users\moo\Desktop\dahub"), Tables = List.Transform(Source[Content], each Table.PromoteHeaders(Csv.Document(_,null,null,null,1252))), SingleTable = Table.Combine(Tables) in SingleTable
PowerQuery: Getting multiple CSVs into PowerPivot
CSV files are combined on the fly into a single table NOTE: *All* CSV files must have the same structure
Load to PowerPivot
Flash intro to PowerPivot
Enable to Add-in and it’s all systems go
Google “enable powerpivot addin"
PowerPivot window is where the magic happens
Has calculated columns like Excel but that’s where similarity ends
NOTE: avoid using calculated columns unless you absolutely have to.
!
They are very costly in terms of performance as they are stored in memory.
Portable “measures” are the unit of work for PowerPivot
# Sessions:=SUM('dahub_sessions'[sessions])
special equal
operator
keep this explicit and eye-friendly full column reference
that is being summarised
Every measure is simply a building block
% Conversion Rate:= [# Transactions]/[# Sessions]
Allows you to build sophisticated
formulas
Each measure is made up of other measures. PowerPivot resolves all the dependencies and calculates them in the right order.
One change, trickles through entire
reporting
If the name is ‘visits’ and your field is now ‘sessions’, you make 1 change and all your measures update like magic.
Calculated on the fly, not stored in
memory
Until you actually use them in a pivot, they add no performance overhead. Maintainability heaven at no extra cost.
Why measures are so amazing
Just drag to the Pivot and voila
Real-world examples
DISTINCTCOUNT function magic!
# Unique Campaigns:= DISTINCTCOUNT(‘dahub_sessions'[campaign_id]) !# Sessions per Campaign:= DIVIDE([# Sessions], [# Unique Campaigns])
Which channels have a wide portfolio of active campaigns or a very active narrow one?
This is impossible to answer with regular pivot tables
!
Has traffic gone up or down today because
we have fewer campaigns
bringing in traffic?
!
How many campaigns
are bringing in a minimum of 1000
sessions each day?
!
Mind blowing
# Unique Campaigns min 1000 sessions:= CALCULATE( [# Unique Campaigns], FILTER( VALUES('dahub_sessions'[campaign_id]), [# Sessions] >= 1000 ) )
This is the formula… but don’t try to take it in yet
First, PowerPivot sets the pivot coordinates and calculates the “base” measure # Sessions
Then, *before* calculating [# Unique Campaigns], it adds an additional filter that keeps only
campaigns that fit the criteria.
# Unique Campaigns min 1000 sessions:= CALCULATE( [# Unique Campaigns], FILTER( VALUES('dahub_sessions'[campaign_id]), [# Sessions] >= 1000 ) )
Let’s break the formula down….
1. Pivot coordinates are set and underlying data filtered accordingly.
2. Additional FILTER is applied 3. And only *afterwards* [# Unique Campaigns] is calculated
!
What % of all campaigns
are bringing in a minimum of 100
sessions each day?
Variations
Campaigns / ad group / keywords /
landing pages
[# Sessions] >= 50 !Size of your effectively active SEO/PPC portfolio and how that changes over time.
Use Cost per Conversion instead
If you have cost in your data, create a [£ Cost per Conversion] and swap [# Sessions] with it. !Monitor the number of adgroups / keywords exceeding the maximum budget
Combine multiple conditions in the
FILTER
FILTER( VALUES('dahub_sessions'[keyword_id]), [£ Cost per Conversion] >= 50 && [# Clicks] >= 10 )
More variations…Campaigns and
channels bringing most of the high
spenders
Which channels or campaigns bring the highest number of transactions over a certain Revenue threshold? !(requires that you have a dataset with source_medium, campaign, and transaction_id and you create a measure # Unique Transactions using transaction_id)
Campaigns and channels bringing
*predominantly* high spenders
If you have cost in your data, create a [£ Cost per Conversion] and swap [# Sessions] with it. !Monitor the number of adgroups / keywords exceeding the maximum budget
# Unique Transactions min £500:= CALCULATE( [# Unique Transactions], FILTER( VALUES(‘dahub_sessions'[transaction_id]), [£ Transaction Revenue] >= 500 ) )
Let’s break the formula down….
NOTE: In pivot you need source_medium and / or campaign on rows and you need transaction_id in your data
Banding
The problem: too many unique values to analyse. !
The solution: creating dynamic groups to “cluster” very granular data into a small number of groups
The nested IF way….
=IF( [keyword_id] contains "<your brand name>", "brand", IF( [keyword_id] contains "not provided", "not provided", IF( [keyword_id] contains "not set", "not set", "generic" ) ) )
The nested IF way in PowerPivot using SWITCH function
=SWITCH( TRUE(), IFERROR(SEARCH("<your brand name>", 'dahub_sessions'[keyword_id]), -1) <> -1, "brand", IFERROR(SEARCH("not provided", 'dahub_sessions'[keyword_id]), -1) <> -1, "not provided", IFERROR(SEARCH("not set", 'dahub_sessions'[keyword_id]), -1) <> -1, "not set", "generic" )
Nested IFs forever gone
Which landing pages attract predominantly branded / generic traffic?
CALCULATE is a super SUMIF The single most powerful feature in PowerPivot
# Sessions Branded:= CALCULATE( [# Sessions], 'dahub_sessions'[brand_group] = "brand" ) !# Sessions Non Branded:= CALCULATE( [# Sessions], 'dahub_sessions'[brand_group] = "non brand" )
This is a CALCULATE filter that gets added *before* [# Sessions] is calculated
CALCULATE allows segmentation you could never do before
If a CALCULATE filter is on a column that’s already in pivot, it gets overridden.
Remove the column from pivot and calculation still works!
CALCULATE filters have countless uses
Determine hidden biases in AB testing
See if your variations had a comparable % of branded / non branded traffic which might skew the results. !Also works with device, mobile traffic and any other dimension you might have in your data.
Works best when you create custom
“clusters” using SWITCH
The formulas work with any dimension in your dataset but if you really want to unlock CALCULATE’s filtering potential, it really pays to create custom calculated columns using the SWITCH formula.
Create horizontal conversion funnels
CALCULATE is the essential building block for taking conversion funnel analysis to the net step
Step 1. The right data for Horizontal Funnels
To get a Funnel Step column you need to create segments for each step in your web analytics tool and export them as CSV file. Then, import into PowerPivot using Power Query and the multiple CSV import method.
Step 1. The right data for Horizontal Funnels
You need these segments: !All Sessions (unsegmented) Category Pages Products Add to Basket Basket Secure Login Address Confirm Order Payment
Step 2. Use CALCULATE on each funnel step
# Sessions All:= CALCULATE( [# Sessions Funnel], 'dahub_funnel'[funnel_step] = "All" ) !… !# Sessions Payment:= CALCULATE( [# Sessions Funnel], 'dahub_funnel'[funnel_step] = "Payment" )
Create a new measure for each step in the funnel
Step 2. Use CALCULATE on each funnel step
This allows you to create custom “goals” on the fly out of *ANY* segment
Step 3. Create ratios for each funnel step
% Sessions Address:= DIVIDE([# Sessions Address], [# Sessions All])
Use All Sessions as a base for division:
Use previous funnel step as a base for division:
% Sessions Address progress:= DIVIDE([# Sessions Address], [# Sessions Secure Login])
Step 4. Add measures to Pivot and analyse
You can use *ANY* dimension you have available in your dataset on rows. Here, it’s Landing Page but you can use date, channel dimensions, device etc. !
Can EVEN add an additional segmentation level like user type (newly acquired, loyal etc)
ResourcesBest book for PowerPivot novices with gradual learning curve. !
By the end it gets pretty advanced. You learn about relationships, how to model multiple tables, time intelligence functions and much more.
ResourcesAll in one reference for formulas for almost any scenario. All explained and broken down. !
You need a good understanding of PowerPivot to begin with so don’t get this first.
Questions?
Bonus - Lifecycle metrics
Essential for comparing business entities (users, customers) as well as assets (content, landing pages, promos etc)
Step 1. Find First Date for each landing page
First Date Landing Page:= CALCULATE( MIN('dahub_sessions'[session_date]), ALL('dahub_sessions'[session_date]), VALUES('dahub_sessions'[landing_page_id]) )
Step 2. Find [# Sessions] on first day
# Sessions in first day:= CALCULATE( [# Sessions], FILTER( ALL('dahub_sessions'[session_date]), 'dahub_sessions'[session_date] = [First Date Landing Page] ), VALUES('dahub_sessions'[landing_page_id]) )
Step 3. Find [# Sessions] in first 7 days
# Sessions in first 7 days:= CALCULATE( [# Sessions], FILTER( ALL('dahub_sessions'[session_date]), 'dahub_sessions'[session_date] >= [First Date Landing Page] && 'dahub_sessions'[session_date] <= [First Date Landing Page] + 7 ), VALUES('dahub_sessions'[landing_page_id]) )