Date post: | 05-Dec-2014 |
Category: |
Technology |
Upload: | tom-bennet |
View: | 2,902 times |
Download: | 3 times |
LOG FILE ANALYSIS The most powerful tool in your SEO toolkit
Tom Bennet
Consultant, Builtvisible
@tomcbennet
Getting Started
What is a log file? A record of all hits that a server has received – humans and robots.
http://www.brightonseo.com/about/
1. Protocol
2. Host name
3. File name
Host name -> IP Address via DNS -> Connection to Server ->
HTTP Get Request via Protocol for File -> HTML to Browser
They’re not pretty…
…but they’re very powerful.
188.65.114.122 - - [30/Sep/2013:08:07:05 -0400] "GET
/resources/whitepapers/retail-whitepaper/ HTTP/1.1" 200 "-"
"Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
Server IP
Timestamp (date & time)
Method (GET / POST)
Request URI
HTTP status code
User-agent
Log Files & SEO
What is Crawl Budget?
Crawl Budget = The number of URLs crawled on each visit to your site.
Higher Authority = Higher Crawl Budget
Crawl Budget Utilisation http://example.com/thin-product-page-1
http://example.com/category/thin-product-page-1
http://example.com/category/subcategory/thin-product-page-1
http://example.com/category/subcategory/thin-product-page-1?colour=blue
Etc…
Conservation of crawl budget is key.
Working With Logs
Preparing Your Data Extraction: Varies by server. See accompanying guide.
Filter: By Googlebot user-agent, validate the IP range. https://support.google.com/webmasters/answer/80553?hl=en
Tools: Gamut and Splunk are great, but you can’t beat Excel.
Working in Excel 1. Convert .log to .csv
(cool tip: just change the file extension)
Working in Excel 2. Sample size
(60-120k Googlebot requests / rows is a good size)
Working in Excel 3. Text-to-columns
(a space will usually be a suitable delimiter)
Working in Excel 4. Create a table
(Label your columns, sort by timestamp)
Investigate
Most vs Least Crawled
Formula: Use COUNTIF on Request URL.
Tip: Extract top-level category for crawl distribution by site-section.
http://www.brightonseo.com/speakers/person-name/
Crawl Frequency Over Time
Formula: Pivot date against count of requests.
Tip: Segment by site section or by user-agent (G-bot Mobile, Images, Video, etc).
HTTP Response Codes
Formula: Total up HTTP Response Codes.
Tip: Find most common 302s or 404s, filter by code and sort by URL occurrence.
Level Up Robots.txt – Crawl all URLs with Screaming Frog to determine if they are
blocked in robots.txt. Investigate most frequently crawled.
Faceted Nav Issues – Dedupe a list of unique resources, sort by times requested.
Sitemap – Add your sitemap URLs into an Excel table, VLOOKUP against your logs. Which mapped URLs are crawl deficient?
CSS / JS – These resources should be crawlable, but are files unnecessary for render absorbing an inordinate amount of crawl budget?
Top Level Crawl Waste
Formula: Use IF statements to check for every cause of waste.
Crime = Solved
All Brighton SEO attendees will receive the guide via email.
THANKS FOR LISTENING
Get in touch e: [email protected] t: @tomcbennet
Tom Bennet
Consultant, Builtvisible
@tomcbennet