How to Sharpen Your Investigative Analysis with PowerPivot

Post on 21-Mar-2017

43 views 2 download

transcript

How to Sharpen Your Investigative Analysis

with the New Excel(a PowerPivot intro)

Carmen Mardiros - navabi GmbH

DA Hub 2015

Core component of the Microsoft

self-serve BI stack

Fast and intelligent data modelling for the Excel pro. !Stack also includes PowerQuery (getting and cleaning data), PowerView (reporting) and PowerBI (online report publishing). !

Integrated in Excel In many ways it feels very familiar (especially if you use pivot tables and charts extensively).

It’s FREE Well, as long as you have Excel Professional Plus 2010 or 2013. !Highly recommended: Get the 2013 64-bit version

What is Power Pivot?

How PowerPivot helps you to become

a better analyst

Analytics tools can’t substitute you. But they can help you to become more efficient, unlock your true potential and get the recognition you deserve. !Today is about lots of examples. !

Ready to use formulas

Not enough time to break all formulas down or explain how PowerPivot works in detail, but will explain how and when to use them, what to change about them, how your data must look like for them to work.

Resources to develop your PowerPivot

skills

A few titles to help you build upon what you’ve learned today.

What today is about

After today, pivot tables will never look the

same again. !

So what’s wrong with Excel anyway?

1. Excel can’t handle lots of data…

… PowerPivot handles many millions easily

2. Regular pivot calculated fields are very basic…

… PowerPivot bends the “normal” pivot rules to its will

3. Must re-create formatting every time you add a metric to a regular pivot…

Takes 8 clicks to set the formatting for Transactions and 2 more to change the title. !Remove it from the pivot and add it again? Start all over. !Every. Single. Time

… in PowerPivot you change once and formatting stays the same

Flash intro to Power Query

PowerQuery: Getting multiple CSVs into PowerPivot

Connectors for many databases, Facebook, Salesforce, Hadoop, feeds, Excel files, CSV files etc

and very soon Google Analytics

PowerQuery: Getting multiple CSVs into PowerPivot

PowerQuery has its own language as well as intuitive UI. !

We use formula to get keep only 1 header from our folder of cdv files

PowerQuery: Getting multiple CSVs into PowerPivot

let Source = Folder.Files(“C:\Users\moo\Desktop\dahub"), Tables = List.Transform(Source[Content], each Table.PromoteHeaders(Csv.Document(_,null,null,null,1252))), SingleTable = Table.Combine(Tables) in SingleTable

PowerQuery: Getting multiple CSVs into PowerPivot

CSV files are combined on the fly into a single table NOTE: *All* CSV files must have the same structure

Load to PowerPivot

Flash intro to PowerPivot

Enable to Add-in and it’s all systems go

Google “enable powerpivot addin"

PowerPivot window is where the magic happens

Has calculated columns like Excel but that’s where similarity ends

NOTE: avoid using calculated columns unless you absolutely have to.

!

They are very costly in terms of performance as they are stored in memory.

Portable “measures” are the unit of work for PowerPivot

# Sessions:=SUM('dahub_sessions'[sessions])

special equal

operator

keep this explicit and eye-friendly full column reference

that is being summarised

Every measure is simply a building block

% Conversion Rate:= [# Transactions]/[# Sessions]

Allows you to build sophisticated

formulas

Each measure is made up of other measures. PowerPivot resolves all the dependencies and calculates them in the right order.

One change, trickles through entire

reporting

If the name is ‘visits’ and your field is now ‘sessions’, you make 1 change and all your measures update like magic.

Calculated on the fly, not stored in

memory

Until you actually use them in a pivot, they add no performance overhead. Maintainability heaven at no extra cost.

Why measures are so amazing

Just drag to the Pivot and voila

Real-world examples

DISTINCTCOUNT function magic!

# Unique Campaigns:= DISTINCTCOUNT(‘dahub_sessions'[campaign_id]) !# Sessions per Campaign:= DIVIDE([# Sessions], [# Unique Campaigns])

Which channels have a wide portfolio of active campaigns or a very active narrow one?

This is impossible to answer with regular pivot tables

!

Has traffic gone up or down today because

we have fewer campaigns

bringing in traffic?

!

How many campaigns

are bringing in a minimum of 1000

sessions each day?

!

Mind blowing

# Unique Campaigns min 1000 sessions:= CALCULATE( [# Unique Campaigns], FILTER( VALUES('dahub_sessions'[campaign_id]), [# Sessions] >= 1000 ) )

This is the formula… but don’t try to take it in yet

First, PowerPivot sets the pivot coordinates and calculates the “base” measure # Sessions

Then, *before* calculating [# Unique Campaigns], it adds an additional filter that keeps only

campaigns that fit the criteria.

# Unique Campaigns min 1000 sessions:= CALCULATE( [# Unique Campaigns], FILTER( VALUES('dahub_sessions'[campaign_id]), [# Sessions] >= 1000 ) )

Let’s break the formula down….

1. Pivot coordinates are set and underlying data filtered accordingly.

2. Additional FILTER is applied 3. And only *afterwards* [# Unique Campaigns] is calculated

!

What % of all campaigns

are bringing in a minimum of 100

sessions each day?

Variations

Campaigns / ad group / keywords /

landing pages

[# Sessions] >= 50 !Size of your effectively active SEO/PPC portfolio and how that changes over time.

Use Cost per Conversion instead

If you have cost in your data, create a [£ Cost per Conversion] and swap [# Sessions] with it. !Monitor the number of adgroups / keywords exceeding the maximum budget

Combine multiple conditions in the

FILTER

FILTER( VALUES('dahub_sessions'[keyword_id]), [£ Cost per Conversion] >= 50 && [# Clicks] >= 10 )

More variations…Campaigns and

channels bringing most of the high

spenders

Which channels or campaigns bring the highest number of transactions over a certain Revenue threshold? !(requires that you have a dataset with source_medium, campaign, and transaction_id and you create a measure # Unique Transactions using transaction_id)

Campaigns and channels bringing

*predominantly* high spenders

If you have cost in your data, create a [£ Cost per Conversion] and swap [# Sessions] with it. !Monitor the number of adgroups / keywords exceeding the maximum budget

# Unique Transactions min £500:= CALCULATE( [# Unique Transactions], FILTER( VALUES(‘dahub_sessions'[transaction_id]), [£ Transaction Revenue] >= 500 ) )

Let’s break the formula down….

NOTE: In pivot you need source_medium and / or campaign on rows and you need transaction_id in your data

Banding

The problem: too many unique values to analyse. !

The solution: creating dynamic groups to “cluster” very granular data into a small number of groups

The nested IF way….

=IF( [keyword_id] contains "<your brand name>", "brand", IF( [keyword_id] contains "not provided", "not provided", IF( [keyword_id] contains "not set", "not set", "generic" ) ) )

The nested IF way in PowerPivot using SWITCH function

=SWITCH( TRUE(), IFERROR(SEARCH("<your brand name>", 'dahub_sessions'[keyword_id]), -1) <> -1, "brand", IFERROR(SEARCH("not provided", 'dahub_sessions'[keyword_id]), -1) <> -1, "not provided", IFERROR(SEARCH("not set", 'dahub_sessions'[keyword_id]), -1) <> -1, "not set", "generic" )

Nested IFs forever gone

Which landing pages attract predominantly branded / generic traffic?

CALCULATE is a super SUMIF The single most powerful feature in PowerPivot

# Sessions Branded:= CALCULATE( [# Sessions], 'dahub_sessions'[brand_group] = "brand" ) !# Sessions Non Branded:= CALCULATE( [# Sessions], 'dahub_sessions'[brand_group] = "non brand" )

This is a CALCULATE filter that gets added *before* [# Sessions] is calculated

CALCULATE allows segmentation you could never do before

If a CALCULATE filter is on a column that’s already in pivot, it gets overridden.

Remove the column from pivot and calculation still works!

CALCULATE filters have countless uses

Determine hidden biases in AB testing

See if your variations had a comparable % of branded / non branded traffic which might skew the results. !Also works with device, mobile traffic and any other dimension you might have in your data.

Works best when you create custom

“clusters” using SWITCH

The formulas work with any dimension in your dataset but if you really want to unlock CALCULATE’s filtering potential, it really pays to create custom calculated columns using the SWITCH formula.

Create horizontal conversion funnels

CALCULATE is the essential building block for taking conversion funnel analysis to the net step

Step 1. The right data for Horizontal Funnels

To get a Funnel Step column you need to create segments for each step in your web analytics tool and export them as CSV file. Then, import into PowerPivot using Power Query and the multiple CSV import method.

Step 1. The right data for Horizontal Funnels

You need these segments: !All Sessions (unsegmented) Category Pages Products Add to Basket Basket Secure Login Address Confirm Order Payment

Step 2. Use CALCULATE on each funnel step

# Sessions All:= CALCULATE( [# Sessions Funnel], 'dahub_funnel'[funnel_step] = "All" ) !… !# Sessions Payment:= CALCULATE( [# Sessions Funnel], 'dahub_funnel'[funnel_step] = "Payment" )

Create a new measure for each step in the funnel

Step 2. Use CALCULATE on each funnel step

This allows you to create custom “goals” on the fly out of *ANY* segment

Step 3. Create ratios for each funnel step

% Sessions Address:= DIVIDE([# Sessions Address], [# Sessions All])

Use All Sessions as a base for division:

Use previous funnel step as a base for division:

% Sessions Address progress:= DIVIDE([# Sessions Address], [# Sessions Secure Login])

Step 4. Add measures to Pivot and analyse

You can use *ANY* dimension you have available in your dataset on rows. Here, it’s Landing Page but you can use date, channel dimensions, device etc. !

Can EVEN add an additional segmentation level like user type (newly acquired, loyal etc)

ResourcesBest book for PowerPivot novices with gradual learning curve. !

By the end it gets pretty advanced. You learn about relationships, how to model multiple tables, time intelligence functions and much more.

ResourcesAll in one reference for formulas for almost any scenario. All explained and broken down. !

You need a good understanding of PowerPivot to begin with so don’t get this first.

Questions?

Bonus - Lifecycle metrics

Essential for comparing business entities (users, customers) as well as assets (content, landing pages, promos etc)

Step 1. Find First Date for each landing page

First Date Landing Page:= CALCULATE( MIN('dahub_sessions'[session_date]), ALL('dahub_sessions'[session_date]), VALUES('dahub_sessions'[landing_page_id]) )

Step 2. Find [# Sessions] on first day

# Sessions in first day:= CALCULATE( [# Sessions], FILTER( ALL('dahub_sessions'[session_date]), 'dahub_sessions'[session_date] = [First Date Landing Page] ), VALUES('dahub_sessions'[landing_page_id]) )

Step 3. Find [# Sessions] in first 7 days

# Sessions in first 7 days:= CALCULATE( [# Sessions], FILTER( ALL('dahub_sessions'[session_date]), 'dahub_sessions'[session_date] >= [First Date Landing Page] && 'dahub_sessions'[session_date] <= [First Date Landing Page] + 7 ), VALUES('dahub_sessions'[landing_page_id]) )