Date post: | 18-Dec-2015 |
Category: |
Documents |
Upload: | phoebe-chandler |
View: | 217 times |
Download: | 1 times |
How many PURLs would an URL How many PURLs would an URL Checker check…Checker check…
Millennium URL Millennium URL Checker in the Real Checker in the Real
WorldWorld
Mary M. StrouseCatholic University of AmericaILUG 2004
M. Strouse ILUG 2004 2
EvolutionEvolution
• 1994 Field 856 |u designated for URLs• 1998 MARCxGen debuts• 2000 URLVerify (telnet version,
Rel. 2000)• 2002 |u added to other MARC
fields• 2003 Millennium URL Checker
(Rel. 2002 ph. 2)
M. Strouse ILUG 2004 3
URLVerify Web ReportURLVerify Web Reporthttp://[http://[catalog]/screens/urlverify.htmlcatalog]/screens/urlverify.html
~
M. Strouse ILUG 2004 4
Millennium URL Checker ReportReport
Summary of error types (uncheck to hide)
Integrated with MilCat
M. Strouse ILUG 2004 5
Sort by column headers
Resize columns (no truncation)
M. Strouse ILUG 2004 6
Highlight a row and click “Edit” to open record
Click edit to get MARC record
Access public view
M. Strouse ILUG 2004 7
Clicking “GO” opens URL in a browser window (for rechecking)
M. Strouse ILUG 2004 8
Locating Missing Locating Missing LinksLinks
M. Strouse ILUG 2004 9
Automatic Substitution of New URLAutomatic Substitution of New URL
Check boxes to select, then click preview tab
M. Strouse ILUG 2004 10
Uncheck any errors, click “process”
Summary screen
M. Strouse ILUG 2004 11
Correcting URL Directly in ReportCorrecting URL Directly in Report
1) Type in “New URL”
2) Check replace box
3) Preview & process
M. Strouse ILUG 2004 12
Copying Old URL to Edit WindowCopying Old URL to Edit Window
1) Check replace box (must do first)
2) Select Old URL - New URL
3) Edit in new URL window
4) Preview & Process
M. Strouse ILUG 2004 13
Find and Replace (New URL)Find and Replace (New URL)
M. Strouse ILUG 2004 14
Interactive ReportsInteractive Reports
Create new interactive report
Toggle between most recent Automatic and Interactive reports
M. Strouse ILUG 2004 15
Interactive report can run against entire database, a review file, an index range, or a keyword search
M. Strouse ILUG 2004 16
Monday Morning RecheckMonday Morning Recheck
M. Strouse ILUG 2004 17
Can’t minimize or work with desktop while report is running
M. Strouse ILUG 2004 18
Error Error TypesTypes
M. Strouse ILUG 2004 19
htp://app.comm.uscourts.gov
Malformed URL (-2)
M. Strouse ILUG 2004 20
Network is unreachable (-7)
New error type in Phase 3 (Millennium report only)
M. Strouse ILUG 2004 21
http://public.afca.scott.af.mil/public….
M. Strouse ILUG 2004 22
PURLs and Other RedirectsPURLs and Other Redirects
Every server redirection reported as an error
M. Strouse ILUG 2004 23
Redirection can be a sign a resource has moved, and maintenance is warranted.
M. Strouse ILUG 2004 24
Missing slash after directory name reported as permanent redirect (301)
Edit to eliminate from future reports
M. Strouse ILUG 2004 25
Server-side redirect to add timestamp
http://library.nps/navy.mil/uhtbin.cgisirsi/Sun+Apr+20+22:28:15+PDT+2003/0/520/nss.pdf
M. Strouse ILUG 2004 26
All PURLs are identified as redirects, not checked further
True also of 3rd-party link checkers (except Xenu)
M. Strouse ILUG 2004 27
I-Hate-PURLsI-Hate-PURLs WorkflowWorkflowUse automatic substitution to replace PURL with (current) underlying URL
Replace box can’t be batch-selected.
M. Strouse ILUG 2004 28
Beware the “Leaving GPO” Message
M. Strouse ILUG 2004 29
URL Checker reports entire frwebgate “wrapper” as the new URL
http://frwebgate.access.gpo.gov/cgi-bin/leaving.cgi?from=exitpurl.html&to=http%3A//www.uscourts.gov/ttb/index.html
M. Strouse ILUG 2004 30
Library-editable URLBlockLibrary-editable URLBlock File File
Not a substitute for honoring “no robots” conventions!
M. Strouse ILUG 2004 31
Block can be a full URL, domain name or text string
PURL.ACCESS.GPO.GOV
III-specified blocks for major aggregators
M. Strouse ILUG 2004 32
Trust-the-Government WorkflowTrust-the-Government Workflow
1. Unblock GPO PURLs and run interactive report monthly (e.g., after Marcive load)
M. Strouse ILUG 2004 33
2. Exclude working redirects, troubleshoot others
Must load entire report before excluding redirects (slow)
M. Strouse ILUG 2004 34
WAIS Database searches reported as timeout errors (-6)
M. Strouse ILUG 2004 35
WAM Proxy Rewrite URLs Not CheckedWAM Proxy Rewrite URLs Not Checked
Host Unreachable (-5)
3rd-party link checkers report all proxy-rewrite URLs OK even if nonexistent.
M. Strouse ILUG 2004 36
Fool-the-System WorkflowFool-the-System Workflow
856 41|u http://heinonline.org/HeinOnline/ CollectionIndex.pl? journal-cjtl |z <A href="http://0-heinonline.org. columbo.law.cua.edu/HeinOnline/CollectionIndex.pl?journal=cjtl"> View via Hein Online </A>Underlying URL in |u, PURL or proxy-rewrite URL within anchor tag in |z.
M. Strouse ILUG 2004 37
““Multi-threadingMulti-threading” Rate” Rate
• The number of simultaneous “calls” sent to servers at a given time
URL checker > 100 3rd-party link checkers: 20-30
(often user-configurable)• At issue when many resources
concentrated on a few servers• URL Checker activity may be
perceived as an “attack”
M. Strouse ILUG 2004 38
Summary: What URL Checker ChecksSummary: What URL Checker Checks
• URLs in subfield u of 856 fields in Bib. Records (but not URLs in other subfields)
• URLs in 956 fields in electronic reserves (Millennium Media) records
M. Strouse ILUG 2004 39
And What it Doesn’t…And What it Doesn’t…
• URLs or domains in the URLBlock file (aggregators, etc)
• Purls and other redirects• Proxy-rewrite URLs in WAM• Electronic journal issue URLs in checkin
boxes• URLs in bibliographic record notes
M. Strouse ILUG 2004 40
Suggestions for Further Development Suggestions for Further Development – Reports & Editing– Reports & Editing
• Pre-configure large interactive reports (faster loading)
• Allow minimization during report prep• Bypass summary of attached items• Improve copy & paste, batch select &
replace.• Interactive checking of “New URL” column
M. Strouse ILUG 2004 41
Suggestions for Further Development Suggestions for Further Development – Functionality– Functionality
• Follow redirects to final destination• Honor page-level and server-level
robot exclusions, and report with a unique status code
• Customize multi-threading rate• Output report in CSV (comma-
delimited) format
M. Strouse ILUG 2004 42
URL URL CheckerChecker Documentation Documentation
Millennium Manual (Rls. 2003)
Permissions (#105370)
Reports (#105371)
Edit/Replace capability (#105372)
URLBlock (#105373)
M. Strouse ILUG 2004 43
URLVerify DocumentationURLVerify Documentation
Innopac manual, pages 102151-102153
Maintaining Hyperlinks in the WebPac: Tools and Tradeoffs (IUG 8, May 2000) http://www.du.edu/~ttyler/iug2000/ctw/index.html
Tom Tyler’s freeware http://www.du.edu/~ttyler/freeware/
M. Strouse ILUG 2004 44
URL Display WWWOptionsURL Display WWWOptions
• DISPLAY_856 – Defines the order and placement of subfields that form the hypertext link in an OPAC display (default is |z then |u)Multiple subfields (including access and usage notes) display as a single underlined link. Enhancement request: separate WWWoptions to control display of link and notes.
M. Strouse ILUG 2004 45
URL Display WWWOptionsURL Display WWWOptions
• LINK856TEXT – Defines the phrase that appears above the hypertext link in a full display (Default is “Click here to:”)
• ICON_856LINK – controls display of 856 link in a brief display
(Manual #102168)
M. Strouse ILUG 2004 46
Contact:Contact:
Mary M. StrouseDuFour Law Library,Catholic University of [email protected]
Thank you!