Date post: | 27-Jun-2015 |
Category: |
Documents |
Upload: | matteo-romanello |
View: | 549 times |
Download: | 1 times |
06/27/08 1
Critical value-added services for e-journals on classics
Proposal of a Microformat to encode Canonical Texts References
Matteo Romanello, Univ. "Ca' Foscari" di Venezia
ELPUB 2008, Toronto, June 26th 2008
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
2 /17
Preliminary definitions
• Reference Linking:– In electronic publications the capability of transforming textual references
into links to the resource itself referred to.
• Primary and secondary sources in the field of classics:– PRIMARY: witnesses, texts of ancients authors
– SECONDARY: every commentary, monograph, journal article written about a primary source
• Canonical Text References:
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
3 /17
Rationale
• In the Field of Classics:– E-publications still need to be bootstrapped
– Scholars need (and deserve) more effective research tools to be provided
– Necessary provide more (and more useful) value added services
• Switching from content holding to service providing: (Armbruster 2007)
– Favors the Open Access to research findings
– Value Added Services could make the OA economically sustainable
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
4 /17
Critical Value Added Services
• What kind of services are most important for philologists and scholars of Classics?
– Every knowledge domain has some significant entry points to information
– Canonical Texts references are the meaningful ones to access publications in the domain of Classics
• In chemistry: name (and structure) of chemical compounds
• What services?– Reference Linking
– Reference Indexing: accessing journal articles and monographes on the basis of the canonical texts that are referred to within them
But... why a new linking framework?
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
5 /17
Lacks of actual scenarios: search
1. Google search 2. L'Année Philologique search
• String-based search algorithms• No semantic understanding• No multilingual search• High recall, low precision
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
6 /17
Lacks of actual scenarios: reference linking
• Tightly coupled approach
• Hard-linking (1 to 1 mapping)
• Linking system– Peculiar to a given project– Language-dependent– Closed system
<!-- Plut. Sol. 19.1 Canonical Text Reference -->
<a class="citation" target="_blank" href="http://www.perseus.tufts.edu/cgi-bin/ptext?lookup=Plut.+Sol.+19.1">Plut. <em>Sol.</em> 19.1</a>
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
7 /17
Desired Scenario (building an E-scholium)
The first attempt... ...an e-scholium on the Web scale
Venetus A: Marcianus Graecus Z. 454, <http://chs75.harvard.edu/manuscripts/image-viewer>
Map of the Web (Jan 15 2005), <http://www.opte.org/maps/>
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
8 /17
Proposal: loosely coupled approach• Desired linking system:
– Semantic
– Open-ended
– Language-neutral
• Layers separation:
1) Metadata contained in canonical text references
2) Protocols and Programming Interfaces (API)
3) Services
• Glue:
– Client side application
• Implementation:
– Microformats
– CTS (Canonical Texts Services) URNs
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
9 /17
Microformats (http://microformats.org)
– from Blogs and Web 2.0
– Development: pattern and design principles by Microformats community
– Microformats: semantic compounds of Plain Old Semantic HTML (POSH)
tags
– Aimed at embedding semantic data in HTML elements
– Community interested in semantic encoding of citation formats: hBib draft, Microformat for bibliographic references to modern publications
– Examples of MFs:
• geo -> Geographical data
• hCard -> personal profile
• tag -> tags
• HCalendar -> events
• (*) CoinS -> embedding OpenURLs within an HTML element
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
10 /17
CTS URNs
– Lie on the ontology FRBR (Functional Requirements for Bibliographic Records) model
– Provide URNs (Uniform Resource Names) for Canonical Texts References
• e.g. isbn:xxxxxxx
– URNs = unambiguous identifiers for
• Authors
– Homer: urn:cts:greekLit:tlg001
• Works
– Iliad: urn:cts:greekLit:tlg001
• Text passages
– Homer's Iliad book 1, line 1: urn:cts:greekLit:tlg0012.tlg001:1.1)
• Work Editions
– Venetus A: 1.1 Holy Cross / Furman Fellows edd.: urn:cts:greekLit:tlg001.tlg001.greekLit:msA-tei:1.1
• Work exemplars
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
11 /17
Microformatted reference
1 <a class="citation" target="_blank" href=" http://www.perseus.tufts.edu/cgi-
2 bin/ptext?lookup=Plut.+Sol.+19.1">
3 <cite class="ctref">
4 <abbr class="ctauthor" title="urn:cts:greekLit:tlg0007">Plut.</abbr>
5 <em>
6 <abbr class="ctwork" title="urn:cts:greekLit:tlg0007.tlg007">Sol.</abbr>
7 </em>
8 <abbr class="range" title="19.1">XIX 1</abbr>
9 <abbr class="edition" title="Bernadotte Perin"/>
10 </cite>
11 </a>
• URNs and implicit information (e.g. Edition statement) are hidden by using Cascading Stylesheets (CSS) -> separation of content and presentation
Plut. Sol. XIX 1
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
12 /17
Microformats suitability
• Least Power Rule (W3C):– RDF is the best technology to express semantic meaning
– Microformats are an already working solution
– Bottom-up way to Semantic Web
• Strong points:– Rapid and Wide success/adoption (suported by FF3 and IE8)
– More HTML-compliant than RDFa and eRDF
– Forward-compatibility with Resource Description Framework (RDF) through GRDDL
– Embedding -> embedded URNs may be discovered also by 'normal' (unsemantic) search engines
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
13 /17
Prototype of a semantic Reference Linking feature
1
2
3
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
14 /17
Semantic Reference Linking Service
1. Reference detection
2. Construction of the CTS-compliant query
3. Query against CTS repositories
4. Response parsing
5. Content display
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
15 /17
Use of Microformatted references
• Where?– E-journal articles, e-publications to encode references
– Web feeds
– Combined with other Microformats (a conference presentation about the prologue of Homer's Iliad, ...)
• Use of CTS URNs:– as keywords in Dublin Core metadata descriptions
– as semantic tags in folksonomies and social applications (delicious, CiteULike...)
• Value added services to be built upon them:– Targeted search engines
– Aggregators of relevant information
– Piping of web feeds
– Reference linking
– Reference Indexing
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
16 /17
Solution scalability
• How produce now microformatted references:– from XML encoding through XSLT (requires a lot of human work)
• The proposed solution should scale with The Million books library's (Crane 2006) dimensions
– In the Humanities currently several mass digitization projects (Google Book, JSTOR ...)
– Millions of books and journal articles will be soon available
• How to scale?– Building a semantic parser:
• using NLP (Natural Language Processing) techniques
– Named entity recognition
– Edit Distance
– Finite State automata
• should make possible the automatic markup of great amounts of texts
M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008
17 /17
Thank you for your attention