+ All Categories
Home > Data & Analytics > Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

Date post: 30-Jul-2015
Category:
Upload: alteryx
View: 68 times
Download: 1 times
Share this document with a friend
Popular Tags:
32
#inspire 15 Building On-Demand Business Location Datasets Or…How I Stopped Worrying about Bad Business Location Data and Learned to Love the Download Tool Tuesday, May 19, 2014 John Hollingsworth, GIS Manager, Clear Channel Outdoor
Transcript
Page 1: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Building On-Demand Business Location DatasetsOr…How I Stopped Worrying about Bad Business Location Data and Learned to Love the Download Tool

Tuesday, May 19, 2014

John Hollingsworth, GIS Manager, Clear Channel Outdoor

Page 2: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Business Problem

Bad Data = Unhappy Clients

Page 3: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• We create maps and analyses that contain locations of our clients, their competitors, and other Points Of Interest.

• The data need to be current and accurate.

• The data are constantly changing and therefore require a real-time source.

• Existing solutions all have downsides.

Business Problem

Page 4: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Comprehensive Business Dataset (Dun & Bradstreet, DatabaseUSA)• Expensive

• Often outdated

• Often poor spatial accuracy

• Duplicates in some cases (Walmart has pharmacy, tire store, etc.)

Existing Solutions

Page 5: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Not comprehensive

• On-demand requests cost money and time

• Periodically refreshed

Existing Solutions

Aggregators (AggData, Factual)

Comprehensive Business Dataset (Dun & Bradstreet, DatabaseUSA)

Page 6: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Requires geocoding/data quality checks

• Requires continual requests to ensure current data

• Not available in most cases

Existing Solutions

Aggregators (AggData, Factual)

Data from client (spreadsheet of addresses)

Comprehensive Business Dataset (Dun & Bradstreet, DatabaseUSA)

Page 7: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Alteryx-based Solution

Use the Alteryx Download tool to ‘scrape’ data from awebsite’s location tool.

Page 8: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Quick Demonstration

Page 9: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Yikes!!!

Is this legal?

Cuz it doesn’t feel legal.

Page 10: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• US Supreme Court has ruled that “an author who claims infringement must prove "the existence of ... intellectual production, of thought, and conception.“ and also in reference to phone number listings, “these bits of information are uncopyrightable facts”– Feist v. Rural 1991

• Terms of Service agreements on websites do not protect factual information.

• A company could theoretically bring a case for damages if the download process is so intense as to cause a disruption of service for their servers. You may need to throttle your collection to prevent this type of intrusive attack.

• All that said, caveat metentis. Meaning consult your in-house legal staff for additional clarification.

Yes. This Is Legal.

Page 11: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Analyze Web Page and Location App Web Traffic

• Determine Collection Method to Use Based on Website Architecture

• Configure Download Tool

• Parse Results

• Error Correct

• Troubleshoot

Overview Of How To Do This

Page 12: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Analyze Web Traffic

Page 13: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Use Web Traffic Debugging software such as Fiddler• http://www.telerik.com/download/fiddler• Set output to Raw in both windows

• Turn on cookies

• Determine if you must use iterative tool or not – sometimes all of the locations are listed on one page.

• Be rigorous – often there is an obvious, hard way and also a subtle, easy way.

Analyze Web Traffic: Best Practices

Page 14: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Experiment using trial and error by copying data from Inspectors window and running it in the Composer window.

Analyze Web Traffic: Best Practices

Page 15: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Single Request

• Single request returns all addresses and latitude/longitude data

• JSON, XML, main web page

• Hint: Look for single Google Map with all points

Collection Methods

Page 16: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• List of store URLs on main page->pull each page

• List of states->List of stores->pull file or each page

• List of states->List of cities->List of stores->pull file or each page

Collection Methods

Multi-Step

Single Request

Page 17: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• e.g. http://www.store.com/3829

• Iterate through a set number of integers for store IDs

• Can be tricky because sometimes huge gaps in IDs

Collection Methods

Multi-Step

Single Request

Sequential IDs

Page 18: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Use zip codes for search criteria instead of city/state

• Grid Centroids based on search radius

• Grid MBR values based on search radius

• Tip: Experiment with enlarging search radius. If no limit, then you can get all in one request.

Collection Methods

Multi-Step

Single Request

Sequential IDs

Spatial

Page 19: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Common Spatial Searches

Grid centroids as Lat/Long input values with 100 mile radius

Page 20: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Common Spatial Searches

Zip codes nearest to grid centroids as input values with 100 mile radius

Page 21: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Configure Download Tool

Page 22: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Determine GET or POST method

• Watch out for Encode URL Text

• Copy Headers

• Experiment using Fiddler Composer to see which Headers are necessary

• Try without cookie Header as those can expire and break your workflow.

Configure Download Tool

Page 23: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Parse Results

Page 24: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Sample: If you are iterating, just a few iterations to test parse logic.

• Look for meta property if on a store’s page

• Add RecordID if iterating as the JSON will restart numbering

• Use the JSON/XML parsing tools in Alteryx

Parse Results: Best Practices

Page 25: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Use Multi-Row Formula tool to parse HTML

Parse Results: Best Practices

Page 26: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Error Correct

Page 27: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Deduplicate when radius collection method used – Use Unique Tool

• Bad geocodes: you are at the mercy of the geocoder that created the data

• Verify counts using Wikipedia or company's annual report

Error Correct

Page 28: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Troubleshoot

Page 29: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

• Lat/Lon values in Google geocode string that are not real• sll=latitude,longitude is where the search originated, not the

actual point

• IP timeouts – may need to throttle to solve

• Parse cues not in all pages or extra lines cause skips - e.g. address data includes shopping center name, etc.

• Multiple pages in search results

• Some sites include closed stores

Troubleshoot

Page 30: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Q & A

Page 31: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

#inspire15

Free Stuff!!

Go to

http://tinyurl.com/WebScrapingToolsto download zip file containing useful macros and sample workflow.

Page 32: Inspire 2015 - Clear Channel Outdoor: Building On-Demand Business Location Datasets

THANK YOU!

#inspire15


Recommended