rdfa - csee.umbc.eduRDFa: Embedding RDF Knowledge in HTML Some content from a presentation by Ivan...

Post on 25-May-2020

10 views 0 download

transcript

RDFa:EmbeddingRDF

KnowledgeinHTML

SomecontentfromapresentationbyIvanHermanoftheW3c,IntroductiontoRDFa,givenatthe2011SemanticTechnologiesConference.

lSerializationofRDFembeddedinHTML,HTMLorXMLProvidessetofattributes(thea inRDFa)tousewithexistingtagstocarryRDFmetadata

l2004:workondevelopingstandardsbeganl2008:RDFa1.0arecommendation(butonlyinXHTML,whichfailedtolaunch)

l2012-15:RDFa1.1recommendation(worksinHTML4,HTML5)

lSeehttp://rdfa.info/

WhatisRDFa?

lRDFcontentspecifiedinXMLattributes oftagsratherthanelements

lTheXML/HTMLtreestructure isusedascontext,whenappropriate

lSomenewattributesareintroduced andsomeexistingones(@href,@rel)reused

lWhenpossible,HTMLtextcontentusedforliteralvalues

èSamefileusedbybrowser& RDFextractor

PrinciplesofRDFa

Webpageviewedbyaperson

http://www.w3.org/ns/entailment/data/RDFS.html

Thesource<p about="http://www.w3.org/ns/entailment/RDFS"

property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>

SourceandgeneratedRDF…<p about="http://www.w3.org/ns/entailment/RDFS"

property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>

<http://www.w3.org/ns/entailment/RDFS>… .

SourceandgeneratedRDF…<p about="http://www.w3.org/ns/entailment/RDFS"

property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>

<http://www.w3.org/ns/entailment/RDFS><http://purl.org/dc/terms/description>

… .

SourceandgeneratedRDF…<p about="http://www.w3.org/ns/entailment/RDFS"

property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>

<http://www.w3.org/ns/entailment/RDFS><http://purl.org/dc/terms/description>

"Unique identifier for RDFS Entailment." .

TheWebpageviewedbyaperson

Thesource<a about="http://www.w3.org/ns/entailment/RDFS"

rel="http://www.w3.org/2000/01/rdf-schema#seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">

RDF Semantics.</a>

SourceandgeneratedRDF…<a about="http://www.w3.org/ns/entailment/RDFS"

rel="http://www.w3.org/2000/01/rdf-schema#seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">

RDF Semantics.</a>

<http://www.w3.org/ns/entailment/RDFS>….

SourceandgeneratedRDF…<a about="http://www.w3.org/ns/entailment/RDFS"

rel="http://www.w3.org/2000/01/rdf-schema#seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">

RDF Semantics.</a>

<http://www.w3.org/ns/entailment/RDFS><http://www.w3.org/2000/01/rdf-schema#seeAlso>

… .

SourceandgeneratedRDF…<a about="http://www.w3.org/ns/entailment/RDFS"

rel="http://www.w3.org/2000/01/rdf-schema#seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">

RDF Semantics.</a>

<http://www.w3.org/ns/entailment/RDFS><http://www.w3.org/2000/01/rdf-schema#seeAlso>

<http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> .

NtriplesinHTMLL

<http://www.w3.org/ns/entailment/RDFS> <http://purl.org/dc/terms/description>

"Unique identifier for RDFS Entailment." .<http://www.w3.org/ns/entailment/RDFS>

<http://www.w3.org/2000/01/rdf-schema#seeAlso><http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> .

lAllowURI prefixesandsharedsubject,likethis@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix dcterms: <http://purl.org/dc/terms/> .

<http://www.w3.org/ns/entailment/RDFS>rdfs:seeAlso <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> ;dcterms:description "Unique identifier for RDFS Entailment." .

lMaybewecandobetter,insteadofthis

lTurtlesupportsseveralsimplifyingideaslUsecompactURIs (CURIE)whenpossible

– URIwithaprefixdefinedelsewhere,e.g.,foaf:mbox

lMakinguseofthenaturalstructurefor– sharedsubjects– sharedpredicates– creatingblanknodes– etc.

Turtlizing RDFa

CURIEdefinitionandusage<html>…<p about="http://www.w3.org/ns/entailment/RDFS"

property="http://purl.org/dc/terms/description">Unique identifier for <em>RDFS Entailment</em>.</p>

…</html>

l canbereplacedby:<html prefix="dcterms: http://purl.org/dc/terms/">

…<p about="http://www.w3.org/ns/entailment/RDFS"

property="dcterms:description">Unique identifier for <em>RDFS Entailment</em>.</p>

…</html>

lCanbeanywhereintheHTMLtreeandisvalidforentiresub-tree– i.e.,htmlelementnottheonlyplacetohaveit

lThesame@prefixattributecanholdseveraldefinitions:– prefix="dcterm:http://purl.org…foaf:http://…”

lCURIEsand“real”URIscanusuallybemixedlCURIEscannot beusedon@href

Detailson@prefixinRDFa

Sharingsubjects

<html prefix="dcterms: http://purl.org/dc/terms/rdfs: http://www.w3.org/2000/01/rdf-schema#">

…<body about="http://www.w3.org/ns/entailment/RDFS">

…<p property="dcterms:description">Unique identifier for <em>RDFS Entailment</em>.</p>

<p>…<a rel="rdfs:seeAlso" href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210">RDFS Semantics</a>…</p>

Basicprinciple:@aboutisinheritedbychildrennodes,sonoreasontorepeatit

…yielding

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix dcterms: <http://purl.org/dc/terms/> .

<http://www.w3.org/ns/entailment/RDFS>rdfs:seeAlso <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/> ;dcterms:description "Unique identifier for RDFS Entailment." .

Onreusingliterals

lReusingliteralsisaplus,butyoudon’talwayswanttodoit

lThebasicrulesays:the(RDF)LiteralistheenclosedtextfromtheHTMLcontent

lThisisfinein80%ofthecases,but…l…itmaynotbenaturalinmanycases!

Example:dates<body about=".." prefix="dcterms: http://… xsd: http://…"

<address><p property="dcterms:date" datatype="xsd:date">2010-07-05</p>

</address></body>

lThisleadsto:@prefix dcterms: <http://…> .@prefix xsd: <http://…> .<..> dcterms:date "2010-07-05"^^xsd:date .

l2010-07-05isofficialISOformat(forxsd:date)but“July5,2010”ispreferredbypeople

Usageof@content<body about=".." prefix="dcterms: http://… xsd: http://…"

<address><p property="dcterms:date" datatype="xsd:date"

content="2010-07-05">July 5, 2010</p></address>

</body>

lAlsoleadsto:@prefix dcterms: <http://…> .@prefix xsd: <http://…> .<..> dcterms:date "2010-07-05"^^xsd:date .

lHereisourrulesofar– @aboutsetsthesubject– @hrefsetstheobject

lButthatisnotalwaysgoodenough– Wemaynotwanttointroduceanactivelink(i.e.,"a"element)onthewebpage

– whataboutotherlinksinHTML?

Onsubjectsandobjects

Wemaynotalwayswantlinks…

<span about="http://www.ivan-herman.net/foaf#me"><span rel="rdfs:seeAlso"

resource="http://www.w3.org/People/Ivan/">Activity Lead</span></span>

lThe RDFa @resource attribute is equivalent to @href

lSets the object, just like @href but is ignored by browsers, e.g.,:

Morefeatures

lRDFa 1.1hasmorefeaturesthatmakeiteasiertorepresentknowledgecompactlyinHTML

lThesetakeadvantageoftheHTMLtreecontext

lWe’llskipthedetails,whichyoucanfindin– RDFa1.1Primer– RDFa1.1Core

lSometoolsalreadyhaveRDFafacilities:– e.g.,itispossibletoaddtherightDTDtoDreamweaver,Amayahasitatitscore,etc.

lTherearepluginsto,e.g.,WordPress,togenerateRDFamarkup

lCMSsystems(likeDrupal7)mayhaveRDFabuiltintheirpublicationsystem– usersgenerateRDFawhethertheyknowaboutitornot…

AuthoringRDFa

lMajorsearchengines(Google,Yahoo)processRDFaforvocabulariestheyunderstandcanuse

lTherearelibraries,distillers,etc.,toextractRDFainformation– maybepartofRDFdevelopmentenvironmentslikeRedland,RDFLib

– see,forfurtherreferences,http://rdfa.info/wiki/Consume

lFacebook’s“socialgraph”isbasedonRDFa

ConsumingRDFa

ApagefromBestBuyRDFa for Facebook markup, JSON-LD for search engines

FB’sOpenGraphProtocol

lRDFa+HTML filecanjustbeonaserver– theclientextractstheRDFcontent

lContentnegotiationscanbesetupontheserverside– theclientgetstheformathe/sheasksfor– theRDFcontentcaneitherbegeneratedontheflyorstoredontheserverstatically

PublishingRDFa

Embeddedmetadata(microdataorRDFa)isusedtoimprovesearchresultpage– atthemomentonlyafewvocabulariesarerecognized,butthatisevolvingcontinually

Google’srichsnippets

AnumberofpopularsitespublishRDFaaspartoftheirnormalpages:

– Tesco,BestBuy,Slideshare,TheLondonGazette,Newsweek,MSNBC,O’ReillyCatalog,theWhiteHouse…

– CreativeCommonssnippetsareinRDFa(e.g.,onFlickr)

Effectsof,e.g.,GoogleorFacebook

CourtesyofJayMyers,BestBuy,SemTech2010Presentation

BestBuyexampleofRDFause

BestBuyexampleofRDFaUse

CourtesyofJayMyers,BestBuy,SemTech2010Presentation

lReportedinaBestBuyblog:– GoodRelations+RDFa improvedGoogleranktremendously

– 30%increaseintrafficonBestBuystorepages– Yahooobserversa15%increaseinclick-throughrate

lToday,BestBuyusesRDFaformuchmorethanjustsnippets– E.g.,tolocateshopsthathavecertainproductsonstock…

EffectsonBestBuy

LibraryofCongressRDFause

LibraryofCongressRDFause

Overstock.com example

Overstock.com example

Drupalcontentmanagementsystem

l RDFsupportinDrupalv.7

l MajorCMSsysteml HasRDFathiscore,pagescontainRDFa

l InonestepmillionsofpagesofadditionalRDFdata!

TheExaminer.com

TheExaminer.com

Extractingthedatardfa>pythongetdata.py "http://www.w3.org/ns/entailment/data/RDFS.html"@prefixdc:<http://purl.org/dc/terms/>.@prefixent:<http://www.w3.org/ns/entailment/>.…ent:RDFS aent:Entailment ;dc:creator <http://www.ivan-herman.net/foaf#me>;dc:date "2010-05-03"^^xsd:date ;dc:description "UniqueidentifierforRDFSEntailment";rdfs:comment "ThespecificationfortheRDFSentailmentis…SemanticsW3CRecommendation.";rdfs:isDefinedBy <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#rdfs_entailment>;rdfs:seeAlso <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/>.

<http://www.w3.org/ns/entailment/data/RDFS.html>dc:title "InformationResourceRDFSEntailment";xhv:stylesheet <http://www.w3.org/StyleSheets/TR/base>.

<http://www.ivan-herman.net/foaf#me>afoaf:Person ;rdfs:seeAlso <http://www.ivan-herman.net/foaf>;foaf:mbox <mailto:ivan@w3.org>;foaf:name "IvanHerman";foaf:title "SemanticWebActivityLead";foaf:workplaceHomepage <http://www.w3.org>.

getdata.py isverysimpleimportrdflib,sysifnot(1<len(sys.argv)<4):print'usage:pythongetdata.py url [‘json-ld’|rdfa |rdfa1.1|microdata|html]'

print'eg:pythongetdata.py "http://www.w3.org/ns/entailment/data/RDFS.html"'

sys.exit(0)

url =sys.argv[1]format=sys.argv[2]iflen(sys.argv)==3else'rdfa1.1’g=rdflib.Graph()g.parse(url,format=format)printg.serialize(format='n3')

OpenLinkStructuredDataSniffer*

* http://osds.openlinksw.com/

OpenLinkStructuredDataSniffer*

* http://osds.openlinksw.com/

lWebdeveloperswantcontentproviderstoaddstructureddatatoHTMLpages

lContentprovidersareincentivizedtodosobecausetheircontentwillbebetterunderstood,rankedhigher,moreuseful,etc.

lRDFaismostpowerful& flexibleknowledgemarkupstandardunderstoodbysearchengines

lRDFaisalsoanalternativeserializationoffullRDF

Conclusions