Date post: | 26-Jan-2015 |
Category: |
Technology |
Upload: | patrick-obrien |
View: | 2,478 times |
Download: | 2 times |
Semantic Web SEO: Using Linked Data and schema.org to improve Library Reach and Digital Repository Access
Kenning Arlitsch & Patrick OBrien DLF Fall – Denver, Colorado November 5, 2012
Today’s Objec.ves
u Basic understanding of v Semantic Web SEO for digital repositories v How to get started incorporating Schema.org and linked data into a digital repository
u Implement baseline metrics to support pre/post funding decisions of digital repositories v Simplify setup and administration of Google Analytics and Google Webmaster for an organization and its stakeholders
v Implement Digital Repository SEO Google Analytics dashboard
Agenda
u Why SEO & the Semantic Web Matters v Performance & Accountability v The semantics of what really matters today
u How to Get Started v SEO Administration at an Institutional Scale v Enhance Your Data v Clean up You Data
You can not evaluate what you do not measure
"We cannot call a digital-‐library or electronic-‐publishing system a success if we cannot measure and interpret its use"
-‐ -‐ Ann Peterson Bishop “Logins and Bailouts: Measuring Access, Use, and Success in Digital Libraries”
The Journal of Electronic Publishing Volume 4, Issue 2, December, 1998
Funding providers want more accountability and demonstrated value*
u “IMLS is focusing on areas where it can best effect change and measure its results.”**
u The IMLS assessment model will “identify effective museum and library services through performance monitoring” among other things.**
* ACRL Research Planning and Review Committee, “2010 top ten trends in academic libraries,” June 2010 **Institute of Museum and Library Services. 2011. “Creating a Nation of Learners; IMLS Five-Year Strategic Plan 2012–2016”
Accountability extends beyond gran.ng agencies
u State Legislatures v Local tax payers
u University administration u Library administration u Donors u Association of Research Libraries statistics
Accountability at the Ins.tu.onal level
u Enable all your Stakeholders v Collection Managers v IT Personnel v Administrators
u Avoid the free-‐for-‐all of silos u Establish an institutional master account
v Administer rights v Everyone uses same baseline metrics and tools
2010: began looking at proxy metrics for digital collec.on public accessibility and use
u 12+ Billion v Number of search queries submitted to Google each month by Americans*
u 12% v Percentage of our digital collection content in Google index
u 0.5% v Percentage of our USpace IR scholarly papers accessible to researchers using Google Scholar
* http://www.comscore.com/Press_Events/Press_Releases/2012/1/comScore_Releases_December_2011_U.S._Search_Engine_Rankings
Basic SEO has improved collec.on accessibility in Google across the board…
100%
79%
87%
51%
37%
12%
0% 25% 50% 75% 100%
High**
Average
07/05/10 04/04/11 11/30/11
Google Index Ratio - All Collections*
* Google Index Ratio = URLs submitted / URLs Indexed by Google for about 150 collections containing ~170,00 URLs **Highest index ratio achieved for Collections with over 500 URLs submitted to Google
…almost 100% of USpace IR content is accessible to patrons using Google.
Google Index Ratio
97%
98%
98%
97%
47%
51%
68%
69%
4%
23%
0%
12%
0% 25% 50% 75% 100%
Board of Regents
UScholar Works
ETD 2
ETD 1 07/05/10
11/19/10
10/16/11
*October 16, 2011 Weighted Average Google Index Ratio = 97.82% (10,306/10,536).
…resul.ng in more referrals and visitors
12 week comparison 2010 vs. 2012
Agenda
u Why SEO & the Semantic Web Matters v Performance & Accountability v The semantics of what really matters today
u How to Get Started v SEO Administration at an Institutional Scale v Enhance Your Data v Clean up You Data
Today’s Key Premise, Concepts & Focus
u SEO Goals are to increase access, visibility and use by patrons that value our content
u Semantic Web is a framework of standards and technologies to share, integrate and represent data as concepts across different content, information and system boundaries.
u Semantic Search incorporates the Semantic Web to understand the context and intent of users seeking information and the concepts contained within a document
Why seman.c search is useful
u Perfect application for research & discovery of concepts v Apple Siri v IBM Watson v Google Knowledge Graph
u Making content Search Engine Readable & semantically Understandable can increase v click though rates (CTR) by 15%* v organic trafjic by 30%*
* http://searchengineland.com/how-to-get-a-30-increase-in-ctr-with-structured-markup-105830
Seman.c implies “meaning” or “understanding”
Seman.c implies “meaning” or “understanding”
u Why would I search for “historic landmarks in Denver”?
u Anticipates what information I want?
Seman.c implies “meaning” or “understanding”
u Why would I search for “historic landmarks in Denver”?
u Anticipates what information I want?
Seman.c implies “meaning” or “understanding”
4 Major SE’s commiZed to Schema.org as their seman.c model
The 4 major SE’s have commiZed Schema.org as their Seman.c model
u SE Understandable v Schema.org is a mechanism (i.e., ontology) to communicate the meaning of your data
u SE Readable v Microdata and RDFa are the preferred way SE’s read your data
u US submits 19 Billion queries per month to 3 of these SE’s*
u We have not found any tools within reach of typical Library budgets, or skill sets, that are easily implementable
* http://www.comscore.com/Press_Events/Press_Releases/2012/1/comScore_Releases_December_2011_U.S._Search_Engine_Rankings
Agenda
u Why SEO & the Semantic Web Matters v Performance & Accountability v The semantics of what really matters today
u How to Get Started v SEO Administration at an Institutional Scale v Enhance Your Data v Clean up You Data
Created a SEO Scorecard designed to support pre / post funding decisions
u Assembled Team of v Collection Managers v Business School Group Project v 2nd Year MBA Team
u Focused on the 10 Google Analytics features that support v IMLS & NEH strategic plan v SEO Collection Manager Goals
Created a SEO Scorecard designed to support pre / post funding decisions
Workshop Process
u Diagrams and Process of what we did at Utah u Live Demo Using Montana State (MSU) u Information that would be helpful today
v Access to your organization’s Admin Accounts (i.e., User ID & password) n Google Analytics n Google Webmaster Tools
v An internal list server for your organizations Managers responsible for making pre / post funding digital repository decisions
Diagram of problem domain
Steps for se]ng up Measurement & Evalua.on for your Ins.tu.on and Staff
1. Associate a Google Account with your Institution
2. Staff create their own Google Account using their Institution email address
3. Activate Google Services using your Institution Google Account
v Google Analytics v Google Webmaster Tools
4. Add Staff to Google Services using their Institution email addresses
Diagram of what it all looks like
1
2
Diagram of what it all looks like
3
4
Step 1: Associate a Google Account* (Master) with your Ins.tu.on
u Use an internal list server e.g., [email protected] u Include managers who are responsible for administration v Google Analytics v Google Web Master Tools
* https://accounts.google.com/NewAccount
Step 1: Associate a Google Account (Master) with your Ins.tu.on
Step 2: Staff create their own Google Account* (Master) using Ins.tu.on email
* https://accounts.google.com/NewAccount
Step 3: Ac.vate Google Services using your Ins.tu.on Google Account (Master)
Step 4: Add Staff to Google Services using their Ins.tu.on email addresses
Step 3 & 4: Successful Google Analy.cs
Step 3 & 4: Successful Google Webmaster Tools
Next steps are to test scalable tools and repeatable process
u Found issues with most Analytics conjigurations u We Need study participants to evaluate and test accuracy of additional analytics tools being developed under IMLS Grant program
What type of web analy.cs socware does your IR use?
A. Analytics Service
B. Log Files
C. Don't Know
D. None
HTML
Log Files
IR
Analytics Service Page Tagging {JavaScript} A B
Both types have poten&al accuracy issues for IRs
A. Analytics Services v Under count non-‐HTML (e.g., PDF) jile downloads
B. Log Files v Over count visits & downloads due to spiders, etc.
v Under count page views due to web caching – upto 30% HTML
Log Files
IR
Analytics Service Page Tagging {JavaScript} A B
Analy.cs Services do not track non-‐HTML downloads out of the box
Non-HTML
HTML
Analytics Service Page Tagging {JavaScript} A
Special Config
Analy.cs Services do not track non-‐HTML file downloads via direct external links
HTML
Analytics Service Page Tagging {JavaScript} A
Non-HTML
Agenda
u Why SEO & the Semantic Web Matters v Performance & Accountability v The semantics of what really matters today
u How to Get Started v SEO Administration at an Institutional Scale v Enhance Your Data v Clean up You Data
Tradi.onal SEO is s.ll very important, but not today’s focus.
u Descriptive Page Titles, anchor text, descriptions, etc.
u Easy & Intuitive Site Navigation u Submit sitemaps/conjigure robots.txt jile u Monitor/address errors u Inform staff & assign ownership u Clean metadata u Upgrade repository software
Recommended Background informa.on
u Ronallo, Jason. "HTML5 Microdata and Schema. org." Code4Lib Journal (2012). http://journal.code4lib.org/articles/6400
u Arlitsch, Kenning, and Patrick OBrien. "Invisible Institutional Repositories: Addressing the Low Indexing Ratios of IRs in Google Scholar." Library Hi Tech 30, no. 1 (2012): 60-‐81. http://www.emeraldinsight.com/journals.htm?articleid=17020806
u Arlitsch, Kenning, and Patrick OBrien. "Search Engine Optimization (SEO) for Institutional Repositories." In Technical Advances for Innovation in Cultural Heritage Institutions (TAI CHI) Webinar Series; 2012 Mar 16; pp. 1-‐48. OCLC Research, Online Computer Library Center, Inc. (OCLC), 2012. http://www.oclc.org/resources/research/events/20120316seo.pdf
u Arlitsch, Kenning, and Patrick OBrien. "Search engine optimization (SEO) for digital repositories." In Coalition for Networked Information (CNI) Spring 2011 Membership Meeting; 2011 Apr 4-‐5; San Deigo, California, USA; pp. 1-‐25. J. Willard Marriott Library, University Libraries, University of Utah, 2011. http://content.lib.utah.edu/utils/getfile/collection/uspace/id/1976/filename/713.pdf
Challenge is presen.ng structured data SE’s can iden.fy, parse and digest
Woljinger, N. H., & McKeever, M. (2006, July). Thanks for nothing: changes in income and labor force participation for never-‐married mothers since 1982. In 101st American Sociological Association (ASA) Annual Meeting; 2006 Aug 11-‐14; Montreal, Canada (No. 2006-‐07-‐04, pp. 1-‐42). Institute of Public & International Affairs (IPIA), University of Utah.
Human Readable
Machine Understandable
Google Scholar can read and understand!
Google Scholar
However, Google can not understand or read any of our “structured data”
No Schema.org = Not Understandable
No Microdata or RDFa = Not Readable
Work Shop Excercise
Meta Tag Working Paper 1 -‐ citation_author Arlitsch, Kenning; OBrien, Patrick 2 -‐ citation_date 2011-‐04-‐05 3 -‐ citation_title Search engine optimization (SEO) for digital repositories 6 -‐ citation_volume 7 -‐ citation_issue 8 -‐ citation_jirstpage 1 9 -‐ citation_lastpage 25 10 -‐ citation_doi 13 -‐ citation_keywords SEO Tips, Special Collections, Digital Collection, Institutional Repository, Digital
Repository 16 -‐ citation_technical_report_institution University of Utah 17 -‐ citation_technical_report_number 18 -‐ citation_language en 19 -‐ citation_conference_title Coali'on for Networked Informa'on (CNI) Spring 2011 Membership Mee'ng; 201 Apr 4-‐5;
San Diego, California, USA
21 -‐ citation_pdf_url http://content.lib.utah.edu/utils/getfile/collection/uspace/id/1976/filename/713.pdf 22 -‐ citation_abstract_html_url http://content.lib.utah.edu/cdm/ref/collection/uspace/id/1976 23 – University University of Utah 24 – College University Libraries 25 – Department J. Willard MarrioO Library 26 – subject.LCSH Web search engines; Web sites-‐-‐Registra'on with search engines; Digital libraries-‐-‐Collec'on
development
Describe concepts using Schema.org to help SE understand your repository
u Answer Questions v What type of WebPage? v What content / data does the page contain? v Who was involved?
n Organizations? n People?
v Where is it? u Look at the properties to see if the concept applies
WebPage concepts relevant to digital repositories
u Creative Works > WebPage* u WebPage Classes
v SearchResultsPage v CollectionPage
n ImageGallery n VideoGallery
v ItemPage
u Important Properties v description v breadCrumb v isPartOf v signijicantLink v signijicantLinks
* http://schema.org/WebPage
Typical Digital Repository Content
u CreativeWorks Classes v Article > ScholarlyArticle v Book v Map v Painting v Photograph v MediaObject
n AudioObject n ImageObject n MusicVideoObject n VideoObject
u Important Properties v publisher v sourceOrganization v contentLocation v copyrightHolder v author
* http://schema.org/ScholarlyArticle
Organiza.ons might be relevant
u Organization* v EducationalOrganization
n CollegeOrUniversity v LocalBusiness
n Library** u Important Properties
v member v employee v contactPoint
* http://schema.org/Organization ** http://schema.org/Library
What People might be relevant
u Person*
u Important Properties v memberOf v worksFor v jobTitle v email v afjiliation v alumniOf
* http://schema.org/Person
What loca.ons might be relevant?
u Place* v LandmarksOrHistoricalBuildings
u Intangible > StructuredValue v GeoCoordinates u Important Properties
v geo v photo v address v containedIn
* http://schema.org/Place
Check your work using Google Rich Snipet Tool
<title>Search engine optimization (SEO) for digital repositories</title> <body itemscope itemtype="http://schema.org/WebPage"> <div itemprop="breadcrumb"> <a href="category/ir.html">Uspace Instutional Repository</a> > <a href="category/CollegeofSocialBehavioralScience.html">University Libraries</a> > <a href="category/books-literature.html">J. Willard Marriott Library</a> > </div> <div itemscope itemtype="http://schema.org/ScholarlyArticle">
<span itemprop="name">Search engine optimization (SEO) for digital repositories</span> <div itemscope itemtype="http://schema.org/Person"> <span itemprop="name">Patrick OBrien</span>
<a href="http://www.linkedin.com/in/obrienpatricks" itemprop="url">Patrick OBrien Resume</a> <span itemprop="jobTitle">Semantic Web Research Director</span> <div itemprop="affiliation" itemscope itemtype="http://schema.org/CollegeOrUniversity"> <span itemprop="name">Montana State University Library</span> </div> <div itemprop="affiliation" itemscope itemtype="http://schema.org/Organization"> <a href="http://www.RevXcorp.com" itemprop="name">RevX Corporation</a> </div> </div>
</div> </body>
Ques.ons & Study Par.cipa.on?
Kenning Arlitsch Dean of the Library at Montana State University [email protected] Patrick OBrien Semantic Web Research Director [email protected]