Date post: | 20-Aug-2015 |
Category: |
Business |
Upload: | kelvin-newman |
View: | 3,069 times |
Download: | 1 times |
The Risks and Rewards of Data Scraping for SEO
SiteVisibilitySearch & Digital Marketing Experts
We think beyond the click™
SiteVisibility, the Search Marketing division of AI Digital
• SiteVisibility is a top 20, rapidly growing, award winning digital & search engine marketing agency
• Founded in 2002, 20 employees and £1.1m revenue in 2008
• Renowned for our search marketing expertise• the number 1 marketing podcast on iTunes• the leader in SEO performance models• one of the top 20 marketing blogs in the UK• pioneering ISO standards for search marketing• Investing 2% of our revenue in Search
Marketing R&D
Maximise volume & minimise the cost of leads
So, why is SEO important?
Lexis Nexis - EiB Nov-08 Sep-09
Natural Search Traffic
108 615
Keywords Sending Natural Traffic
40 395
Inbound Links 142 397
Search Engine Entry Pages
31 201
What does SEO rely on? 8 SEO Basics
1. Findability - keywords need to be in Meta Titles, Headings, Content, URLs AND in hyperlinks linking BACK to a particular URL
2. Indexability: eg. Duplicate content, suffixes and sitemaps?
3. The other 6 are Accessibility, Usability, Sharability, Linkability, Convertability, andTrackability
Google likes original, high quality, keyword rich content from high authority sites......
Scraping - legal or illegal?
Data scraping from public data repositories is very common and in most cases legal.
However, if your purpose is to steal site Y's content so you can put it on your site and benefit from it then that is classed as copyright infringement
Scraping on this basis is illegal and…– Violates the Digital Millennium Copyright Act– It can often hurt search engine rankings of websites
(bad for search engine optimisation – SEO)
What Action Can I Take?
Option 1 – Report them to Google & their ISP and / or take legal action
- Could cost you time & money
Option 2 – Deal with it- There are some technical mitigations
Option 3 - Think ahead…- Set up your website to take advantage of these
scrapers and gain some SEO benefit.
Why is Data Scraping a risk for SEO?
For the “Scraper”• Duplicate content but
at least it’s content• Less authority as a
producer of original content
For the “Scrapee”• Google does not like
duplicate content so you could be penalised for:
• Effecting query data• Falsifying the number
of “real” impressions for advertisers
• Your authority as the original content source is in question
Dealing with it – some technical mitigation
Ultimately, if data and content is accessible online, anyone/machine can manually copy and create a new database. Although this practice would be illegal in the UK, it is a known risk to all data publishers.
• Monitor your web analytics for scraping• IP lock out which restricts any IP to maximum access per hour
before blocking the IP or requiring a “captcha” • Use “captcha” forms instead of allowing extractable email
addresses• Block the IP address of all of your known competitors• Generally scraping is done via patterns on the pages. If we use
random page generators then scraping becomes difficult.• Use a Flash layer to display the final data so that it cannot be
scraped whilst making sure you provide for SEO in the design
Thinking ahead – make it work for SEO
• Use absolute URLs in your links• Use internal linking strategically• Make sure each content headline is a
link• Add copyright notice and a link to your
site in the RSS feed• Get some extra juice in Technorati.
And our advice...
1. Recognise there is a SEO Risk / Opportunity
2. Decide on your approach
1. Go legal OR
2. Make it difficult OR
3. Make it work for you OR
4. All of the above
3. Constantly monitor the situation and develop / refine your approach as part of your online strategy
The Risks and Rewards of Data Scraping for SEO
SiteVisibilitySearch & Digital Marketing Experts
We think beyond the click™
Some legal & helpful uses of scraping
• Market research & business intelligence• Data mining your competitor's website to compare
prices, products offered, business partners acquired and other critical data.
• Reputation Management! What if you were alerted to every good or bad comment said about your company or product on a blog, forum or website and could respond with correction or enhancements before mis-information was spread around the Internet?