SEO for the Semantic Web

Post on 12-Jan-2015

6,087 views 0 download

Tags:

description

A brief history of SEO from WWW to RDF, Microformats and SPARQL. First presented at GeekMeet #2 in Cluj Napoca on Mar 1st 2008

transcript

How do the machines know what Tasty Wheat what Tasty Wheat

tasted like?Mouse – The Matrix

Short SEO HistoryShort SEO History• Web1 0• Web1.0

• Web2.0Web2.0

• Web3.0

GenesisGenesis

• A story of the Internet byA story of the Internet, by

• Solving the most important problems

l i fl d b• Greatly influenced by one man…

Tim Berners‐LeeTim Berners Lee

“the World Wide Web is Berners-Lee's alone. He designed it. He loosed it on the gworld. And he more than anyone else has fought to keep it open, nonproprietary and free.”

Time Magazine 1999Time Magazine, 1999

The ProblemThe Problem

• Where can I find the information?Where can I find the information?

“Our ineptitude in getting at the record is largely caused by the artificiality of the systems of indexing ”systems of indexing.

The Atlantic Monthly, 1945

Archie, 1990Archie, 1990

• Indexed file names andIndexed file names and

• Returned results based on pattern matching

Web1 0Web1.0

Web1.0Web1.0

• Means HTMLMeans HTML

• Is born in 1991, with the help of

i ( ) h l f d d• Tim Berners‐Lee (TBL), who also founded

• WWW Consortium (W3C) at MIT, and also

• Created WWW Virtual Library – the 1st catalog

Yahoo Directory, 1994Yahoo Directory, 1994

• Vertical = categories is likeVertical = categories... is like

• “Show me all the stuff and I’ll handle it”

ll i d d ff hi h• Manually indexed stuff, which was

• OK for starters, but…

• Websites quickly grew in number and

• Y! started charging money for one listingY! started charging money for one listing

• Increasingly more money...

,1994,1994

• First SE to fully search textFirst SE to fully search text

• Bought by AOL, then

S ld i hi h• Sold to Excite, which

• Excite went bankrupt and

• WebCrawler ends up bought by InfoSpace

Other “Search Engines”Other  Search Engines

• 1994 reaches 60mil pages in ‘961994, reaches 60mil pages in  96

• 1995, bought by Overture, bought by Y!

996 h b h b• 1996, meta search, bought by Lycos

• 1997, bought by IAC/InterActiveCorp

• 1999, bought by Overture, meaning Y!

Shopping fun, right?Shopping fun, right?

, 1998, 1998

• Open Directory ProjectOpen Directory Project

• Each listing is checked and certified by a volunteervolunteer

• The main source for Google Directory

Current State of Search IndustryCurrent State of Search Industry

Web1.0 ProblemsWeb1.0 Problems

• SE couldn’t understand text soSE couldn t understand text, so 

• They said “why don’t you implement some meta tags (description & keywords) so we canmeta tags (description & keywords) so we can get a glimpse of what you’re saying”

Th l f i h• The relevancy of a page with respect to a keyword was determined by a few factors, so

• It was very easy to abuse and spam, therefore

• Search Results had poor qualityp q y

Web2 0Web2.0

Web2.0Web2.0

• Is coined by Tim O’Reilly yetIs coined by... Tim O Reilly, yet

• TBL later said that “web2.0” is a stupid, meaningless term and that he thought of itmeaningless term and that he thought of it first in ’96 anyway

Web2.0 meansWeb2.0 means

• which grew apart because ofwhich grew apart because of

• PageRank (1998) invented by

& S i h d d h l f• Larry & Sergei who adapted the algo from

• An MIT professor who had developed

• A nasty mathematical formula for positioning keywords in a 3d space model based on the y prelevancy that one kw holds … whatever

PageRank actually meansPageRank actually means

• That a link is a vote andThat a link is a vote and

• Not all links are created equal, so

h li k• It matters who links to you

• Just like in our real life society

• Read the content of pages really well just thatRead the content of pages really well, just that

• Pages were crappy:N t d d di– Non‐standard coding

– Ugly tech (like applets)

– Senseless IA

• So Google said: “don’t do evil and try to nicely format the info, according to W3C standards”(remember TBL)

Enter the SEOEnter the SEO

SEOSEO

• Is a multitude of practices aimed at facilitatingIs a multitude of practices aimed at facilitating the indexing of pages by search engines

• Evolves as the ranking algorithm changes and• Evolves as the ranking algorithm changes, and

• Of course, the algorithm is kept secret.

SEO actually meansSEO actually means

Courtesy of Kelly Ishikawa

SEO actually meansSEO actually means

• An on‐going battle between bots & SEO guysAn on going battle between bots & SEO guys

• Now 100+ factors influence ranking

d ’d lik k h i lk b h• And I’d like to take the time to talk about each one of them in the following…

Just kiddingJust kidding

My SEO Cheat SheetMy SEO Cheat Sheet

• Consider:Consider:1. Page Titles2. URLs (mod_rewrite)3. Anchor Text4. Website Architecture (IA)5. Link Title & Alt Images6. Relevant content (text)7 Sitemap xml7. Sitemap.xml8. Hosting9. Freshness9. Freshness

ResourcesResources

Matt Cutts Blog

Mihai’s SEO Cheat Sheet :D

Web2.0 ProblemsWeb2.0 Problems

• © for pictures articles books etc© for pictures, articles, books, etc

• PPC fraud

i• Privacy

• Search Engine SPAM

• Link bombing

• Paid linksPaid links

• But more important...

Web2.0 ProblemsWeb2.0 Problems

• SE still don’t understand what the $#%@SE still don t understand what the $#%@ you’re talking about

• Crawling a website’s interface to extract info is• Crawling a website s interface to extract info is almost insane

Web3 0Web3.0

Web3.0Web3.0 

• Means semantic webMeans semantic web

• Attention migrates from syntax/formatting to semantics andsemantics and

• Meta Data (data about the data) becomes...

Web3.0Web3.0

&

Resource Description MicroformatsResource DescriptionFramework

Microformats

Resource Description FrameworkResource Description Framework

• A kind of XMLA kind of XML

• RDF = Subject + Predicate + Object

S O i l hi h• S + P + O creates a Triple which

• Can describe almost anything in the universe

• Triples are connectable (eg: FOAF)

• RDFa = XHTML + RDF (W3C compliant)RDFa  XHTML + RDF (W3C compliant)

MicroformatsMicroformats

• hCalendar • hCard• rel‐tag• VoteLinks• XFN• Geo• hResumehR i• hReview

• etc

Case StudyCase Study

SPARQLSPARQL

• SPARQL Protocol and RDF Query LanguageSPARQL Protocol and RDF Query Language

• Standardized on 15th Jan 08 (1 month ago) and

d d b ?• Endorsed by?... TBL

"Trying to use the Semantic Web withoutSPARQL is like trying to use a relational Q y g

database without SQL“

TBLTBL

PotentialPotential

• With SPARQL you skip the presentation layerWith SPARQL you skip the presentation layer

• You can query ad‐hoc any API, so

d ’ d l i d h f• You don’t need to crawl in advance, therefore

• Information will be as fresh as it gets

And possibilitiesAnd possibilities

• Query: “I can has pizza?”Query:  I can has pizza?  

• Returns: A f i d f (XFN F b k)– A friend of yours (XFN ‐ Facebook) 

– has a colleague (FOAF ‐ LinkedIN) who

( )– said that they make good pizza (hReview ‐ yelp) at

– a restaurant nearby (geo – Gmaps)

– Tip: U2 in concert today (hCalendar ‐ upcoming)

Perhaps now we can seePerhaps now we can see

• Why Social Networking Communities areWhy Social Networking Communities are worth so much, even though most of them don’t have a revenue model– Facebook– LinkedIN– Meebo– Beebo – Pipu...

• They/We are the databases of the future

Thanks!Thanks!

“Most of the right choices in SEO come from asking: What’s the best thing for the user?”g g

Matt Cutts

Mih i GhMihai Gheza 

Creative Commons Attribution‐Noncommercial‐Share Alike 3.0 Unported License.