Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | johann-de-boer |
View: | 113 times |
Download: | 2 times |
Web Analytics with RJohann de Boer
Purpose of this presentation
1. Encourage web data analysts to move to R and away from Excel!
2. Help those wanting to learn R to get started.3. Build interest amongst the R community in
developing packages for web analytics.
What is web analytics?Measurement and analysis of (aggregated)
internet data for purposes of optimising website usage.
Web analytics data
Dimensions and metrics
Web analytics dataDimensions Metrics
User Custom dimensions, eg. existing customer flag
Unique users, avg revenue per user
Device Browser and OS, isTablet, isMobile
Unique devices, total visits
Session Traffic source, date and time of visit, landing page
Time on site, pageviews per visit, goal completions
Hit Page URL, page title, event type, product name
Time on page, page loading time, transaction amount
MetricsUnique Visitors, New Visits, % New Visits, Visits, Bounces, Bounce Rate, Bounce Rate, Visit Duration, Avg. Visit Duration, Organic Searches, Impressions, Clicks, Cost, CPM, CPC, CTR, Cost per Transaction, Cost per Goal Conversion, Cost per Conversion, RPC, ROI, Margin, Goal 1 Starts, Goal Starts, Goal 1 Completions, Goal Completions, Goal 1 Value, Goal Value, Per Visit Goal Value, Goal 1 Conversion Rate, Goal Conversion Rate, Goal 1 Abandoned Funnels, Abandoned Funnels, Goal 1 Abandonment Rate, Total Abandonment Rate, Data Hub Activities, Page Value, Entrances, Entrances / Pageviews, Pageviews, Pages / Visit, Unique Pageviews, Time on Page, Avg. Time on Page, Exits, % Exit, Results Pageviews, Total Unique Searches, Results Pageviews / Search, Visits with Search, % Visits with Search, Search Depth, Search Depth, Search Refinements, % Search Refinements, Time after Search, Time after Search, Search Exits, % Search Exits, Goal 1 Conversion Rate, Goal Conversion Rate, Per Search Goal Value, Page Load Time (ms), Page Load Sample, Avg. Page Load Time (sec), Domain Lookup Time (ms), Avg. Domain Lookup Time (sec), Page Download Time (ms), Avg. Page Download Time (sec), Redirection Time (ms), Avg. Redirection Time (sec), Server Connection Time (ms), Avg. Server Connection Time (sec), Server Response Time (ms), Avg. Server Response Time (sec), Speed Metrics Sample, Document Interactive Time (ms), Avg. Document Interactive Time (sec), Document Content Loaded Time (ms), Avg. Document Content Loaded Time (sec), DOM Latency Metrics Sample, Screen Views, Screen Views, Unique Screen Views, Unique Screen Views, Screens / Session, Screens / Session, Time on Screen, Avg. Time on Screen, Total Events, Unique Events, Event Value, Avg. Value, Visits with Event, Events / Visit, Transactions, Ecommerce Conversion Rate, Revenue, Average Value, Per Visit Value, Shipping, Tax, Total Value, Quantity, Unique Purchases, Average Price, Product Revenue, Average QTY, Local Revenue, Local Shipping, Local Tax, Local Product Revenue, Social Actions, Unique Social Actions, Actions Per Social Visit, User Timing (ms), User Timing Sample, Avg. User Timing (sec), Exceptions, Exceptions / Screen, Crashes, Crashes / Screen, Custom Metric Value
DimensionsVisitor Type, Count of Visits, Days Since Last Visit, User Defined Value, Visit Duration, Referral Path, Full Referrer, Campaign, Source, Medium, Source / Medium, Keyword, Ad Content, Social Network, Social Source Referral, Ad Group, Ad Slot, Ad Slot Position, Ad Distribution Network, Query Match Type, Matched Search Query, Placement Domain, Placement URL, Ad Format, Targeting Type, Placement Type, Display URL, Destination URL, AdWords Customer ID, AdWords Campaign ID, AdWords Ad Group ID, AdWords Creative ID, AdWords Criteria ID, Goal Completion Location, Goal Previous Step - 1, Goal Previous Step - 2, Goal Previous Step - 3, Browser, Browser Version, Operating System, Operating System Version, Mobile (Including Tablet), Tablet, Mobile Device Branding, Mobile Device Model, Mobile Input Selector, Mobile Device Info, Mobile Device Marketing Name, Device Category, Continent, Sub Continent Region, Country / Territory, Region, Metro, City, Latitude, Longitude, Network Domain, Service Provider, Flash Version, Java Support, Language, Screen Colors, Screen Resolution, Endorsing URL, Display Name, Social Activity Post, Social Activity Timestamp, Social User Handle, User Photo URL, User Profile URL, Shared URL, Social Tags Summary, Originating Social Action, Social Network and Action, Hostname, Page, Page path level 1, Page path level 2, Page path level 3, Page path level 4, Page Title, Landing Page, Second Page, Exit Page, Previous Page Path, Next Page Path, Page Depth, Site Search Status, Search Term, Refined Keyword, Site Search Category, Start Page, Destination Page, App Installer ID, App Version, App Name, App ID, Screen Name, Screen Depth, Landing Screen, Exit Screen, Event Category, Event Action, Event Label, Transaction, Affiliation, Visits to Transaction, Days to Transaction, Product SKU, Product, Product Category, Currency Code, Social Source, Social Action, Social Source and Action, Social Entity, Social Type, Timing Category, Timing Label, Timing Variable, Exception Description, Experiment ID, Variation, Custom Dimension , Custom Variable (Key 1), Custom Variable (Value 01), Date, Year, Month of the year, Week of the year, Day of the month, Hour, Month, Week, Day, Day of week, Day of week name, Hour of Day, Month of Year, Week of Year, ISO week of the year
265 dimensions and metrics in Google Analytics and growing!
Google Analytics
Source: Charles Farina, e-nor.com blog, Published 9 July 2012
The Web Analytics market
Google Analytics - now Universal
Google Analytics APIs
● Data collection○ Collection APIs and SDKs
● Data extraction○ Core Reporting API
■ Metadata API
○ Real-time Reporting API
○ Multi-Channel Funnels Reporting API
Why use R for web analytics?
R is free, open source and popular!
Spreadsheets are rigid and fragile
R is agile and robust
7 reasons to use R instead of Excel1. Captures each step in your analysis2. Makes it easier to review and fix your
mistakes3. Easy to reproduce and reuse analyses4. Join datasets from multiple sources5. Powerful ways to analyse and visualise your
data6. Automate retrieval of your data via the Core
Reporting API7. Saves time!
Learning and using R
In the beginning...
plyrggplot2lubridatereshape2devtoolshttrroxygen2markdowngit (version control)
Recommended tools and packages
Google Analytics packages for R
● r-google-analytics○ By Google but stopped working for a long time
● rga○ By Bror Skardhamar, popular and active
● ganalytics○ Written by me to create an abstraction from the Core
Reporting API protocol
ganalyticsAutomate extraction ofGoogle Analytics data
Make querying GA data from R an easy and interactive experience
● Queries are manipulated on the fly● Defining filter and segmentation expressions
is easy● Checks queries for errors before sending
○ corrects them automatically in some cases too!
● Creates a level of abstraction from the Core Reporting API - easier to extend functionality
Query expressions
ga:keyword@=buy(search traffic keywords containing “buy”)
A single expression comprises of:● a variable - a dimension or metric● an operator - e.g. equals, contains, regular
expression, greater than, does not equal, ...● an operand - a number (metric) or a
character string (dimension)
Combining expressions
● Expressions can be joint using OR and AND.● OR takes precedence over AND always, and
expressions cannot be grouped.ga:keyword@=buy;ga:city=~^(Sydney|Melbourne)$,ga:isTablet==Yes(search traffic keywords containing “buy” AND [city is [Sydney OR Melbourne] OR via a tablet])
Writing expressions with ganalytics● Filter to pass to core reporting API
ga:keyword@=buy;ga:city=~^(Sydney|Melbourne)$,ga:isTablet==Yes
● Using ganalytics to write thisGaAnd( GaExpr('keyword', '@', 'buy'), GaOr( GaExpr('city', '~', '^(Sydney|Melbourne)$'), GaExpr('isTablet', '=', ‘Yes’) ))
ganalytics Demo
gist.github.com/jdeboer/6569077
How does traffic from desktop, mobile and tablet users change
throughout the day and over the week?
Average number of visits per hour and day - split by desktop, mobile and tablet
R + ggplot2 + plyr + ganalytics =
Get involved!Open source R package development is fun!
Package development
● Use RStudio with Git version control○ Open a free GitHub account○ Use Roxygen2 for generating your documentation
and NAMESPACE file○ RStudio integrates with Git, Roxygen2 and RTools to
make the package build process easy
● Hadley Wickham is a great help○ devtools package - great for installing straight from a
GitHub repository○ read his online book “Advanced R Programming” -
easy to follow package development steps
Learn more...
● Google Analytics: #ganalytics○ Video lessons: google.com.au/analytics/iq.html○ Reference: developers.google.com/analytics
● Learn R: #rstats○ Presciient: presciient.com/courses○ Code School: tryr.codeschool.com○ Coursera: coursera.org/course/compdata○ Intro to R videos by Google: t.co/FQ8DvZEdRW
● Package development: adv-r.had.co.nz● ganalytics: github.com/jdeboer/ganalytics● Follow me on Twitter: @johannux