Experiences of Exposing Semantics to Drive Transcoding

Experiences of Exposing Semantics to Drive Transcoding

Darren Lunn, Sean Bechhofer and Simon HarperInformation Management Group, School of Computer Science

The University of Manchester, Kilburn Building, Oxford Road, Manchester, M13 9PL, [email protected]

[sean.bechhofer | simon.harper]@manchester.ac.uk

Abstract

The World Wide Web (Web) is a visually complex, dy-namic, multimedia system that can be inaccessible to peoplewith visual impairments. SADIe uses semantic annotationsof a Website’s Cascading Style Sheet (CSS) to drive a trans-formation process that can improve access to content forvisually impaired users. The original process of annotatingthe CSS involved the use of an upper ontology, extended bya site specific lower ontology. While this approach providedrich annotation of the CSS terms, experience suggests thatcomponents within the model were inappropriate for the in-teractive system we were developing. This experience hasled to a more pragmatic approach that still provides the nec-essary semantics required to drive the SADIe transcodingtool, but in a more lightweight manner. This paper describesthe lessons learnt from building the ontological models forthe SADIe platform, highlighting pitfalls that developers ofontologies in interactive systems should be wary of.

1. Introduction

People with disabilities, in particular visual impairments,are hindered in their access to information on the Web be-cause it is not designed with their needs in mind. Visuallyimpaired users can make use of tools to access the Web thatread aloud the page. These tools, known as screen readers,access the underlying structure of the Hypertext MarkupLanguage (HTML) to create a sequential, audio renderingof the document. Most Web designers, however, are mainlyconcerned with how content is presented on screen, ratherthan its structure and meaning. Consequently, implicit in-formation that is available through the visual rendering ofthe page is lost to the screen reader, and therefore the user.As an example, consider a navigation menu common onmany Websites. Typically a navigation menu is containedwithin its own distinct chunk, separated from the rest of thecontent, and is located towards the side of the screen. A

sighted user is aware of which element on screen the de-signer intended to be the navigation menu due to the way itis rendered. There is nothing explicitly stating that the ele-ment on the Web page is a menu. The knowledge that it is amenu is implicit from the visual presentation. This implicitinformation is only available to those who can see it, as op-posed to those people who use a screen reader to read aloudthe page content.

Structural-Semantics for Accessibility and Device Inde-pendence (SADIe) addresses this issue by annotating Webdocuments in such a way that information available im-plicitly through on screen rendering, such as a menu, ismade explicit. This is achieved by annotating the render-ing information defined within the Cascading Style Sheet(CSS) of the Website. With the structural semantics ex-posed, transcoding is then applied to the Web page contentso that information held within the page is more accessibleto the audio rendering tools and therefore visually impairedusers.

Initially the CSS elements were annotated using an on-tology that consisted of two parts. The first was an upperontology containing high level abstract concepts represent-ing the potential roles of Web page elements. These rolesincluded concepts such as Removable, which was used toannotate elements that were not important to the Websiteand HighPriority, which identified elements that containedimportant information on the page. The second part ofthe ontology was a Website specific extension to the up-per ontology that contained the elements found within theCSS. These elements were annotated with the roles that theyplayed from the terms available in the upper ontology.

The use of a two-part ontology was a flexible approachthat provided rich annotations of the CSS elements and didnot require any modification to the original HTML or CSSdocuments. However, our experiences in creating and de-ploying the model used within our system highlighted anumber of weaknesses. This included modelling elementswithin the Website that, with hindsight, could have beenavoided and unnecessary complexity when building the on-

2008 First International Workshop on Ontologies in Interactive Systems

978-0-7695-3542-5/08 $25.00 © 2008 IEEE

DOI 10.1109/ONTORACT.2008.9

9

tology. In this paper we present factors, based upon our ownexperiences, that developers of interactive system ontolo-gies should consider. We then propose an alternative solu-tion, based upon the lessons learnt from our original model.By adding a bespoke property to the Cascading Style Sheetthe transcoding engine can obtain the same semantic infor-mation that was available in the ontology but without thelarge overhead of creating a Website specific ontology.

2. Background

Transcoding is a way of transforming Web content sothat it can be accessed on a diverse range of devices [16]. Inadapting Web content, transcoding systems use a variety ofarchitectures, a range of methods and have a diverse targetuser group. However, they all exude similar characteristicsthat allow them to be categorised as either heuristic or se-mantic based methods.

With heuristic transcoding, tools analyse a page andadapt the content based on a set of predefined rules. Guptaet al. exploited the Document Object Model (DOM) to ap-ply transcoding to Web documents [10] . The Web docu-ment was parsed into a DOM tree and the nodes traversedin order to identify content. For example table cell nodesthat had a large link to paragraph ratio were considered tobe “link lists” and removed from the page, allowing the usereasier access to the main content.

Heuristic based transcoding systems can adapt a widevariety of Websites without any additional information withregards to the page content [3]. While manipulations ofthe structure can aid navigation and display, they have twoproblems. The first is that the rules have to be generalenough so as to apply to every possible Web page. There-fore they do not capture and exploit the semantic informa-tion of the page, which is usually only available implicitlythrough the visual presentation. Secondly, a specific usergroup is targeted, such as mobile Web users or visually im-paired users. This means that for each user group or targetdevice, a different set of heuristics needs to be discoveredand applied to the page [15].

In contrast, semantic transcoding uses the semantics ofthe structure or content to make better adaptation deci-sions [14]. Whereas heuristic transcoding makes educatedguesses as to how the page should be adapted, semantictranscoding uses the metadata that describes the structureof the page to make adaptation decisions. Adding seman-tics to an Extensible Hypertext Mark-up Language docu-ments (XHTML) is not a new concept. The Tag Project pro-posed embedding Resource Description Framework (RDF)triples into XHTML [6] as early as 2002. However, thesedocuments would not validate as XHTML and so did notfind favour among the community [17]. The work con-cluded that the RDF specification specifies how to under-

stand the semantics (in terms of RDF triples) in an RDFdocument that contains only RDF. However, it does not ex-plain how and when semantics can be extracted from doc-uments in other namespaces that contain embedded RDF.Furthermore, the XHTML specification explains how toprocess XHTML namespace content, but gives no indica-tion about how to process embedded RDF information [6].

Other methods have been proposed such as Gleaning Re-source Descriptions from Dialects of Languages (GRDDL),which is being investigated by the W3C Web Co-ordinationGroup as a mechanism for encoding a set of ExtensibleMark-up Language (XML) compliant language dialects intoan XHTML document using RDF [20, 8]. GRDDL workson the principle that the XHTML specification provides amechanism for authors to use particular metadata vocabu-laries and thereby indicate the author’s intent to use thoseterms in accordance with the conventions of the commu-nity that originated them. While GRDDL is used to embedspecific dialects, the machinery which allows this embed-ding, RDFa, allows any semantic information to be encodedwithin the XHTML page [2]. The current XHTML 2 Work-ing Draft adopts this approach and allows for any element tohave property or role attributes, facilitating the embeddingof metadata within the Web document whilst still maintain-ing its validity [4]. This machinery is not just limited toset dialects but allows third-party developers to create andembed their own semantic information.

Microformats add semantics to Web pages by using ex-isting tags and attributes found in XHTML in order to al-low different applications and services to reuse the infor-mation held within the document [9]. An issue with addingRDF annotations to current versions of XHTML is that thedocument can become invalid. Microformats circumventthis problem by using facilities that already exist within theXHTML specification, such as the class and title at-tribute of elements. However, while microformats can en-code explicit information to aid machine readability, theyare not independent or extensible and having multiple mi-croformats in a single Web document can create conflicts[1, 18].

As far as our desires and requirements are concerned,neither GRDDL, RDFa, or Mircroformats offer ideal solu-tions. All these solutions are concerned with embedding ex-tra information in a document through a modification of thatdocument. In contrast, our approach differs in that althoughwe are interested in associating semantic information withthe document, the aim is not to embed extra metadata orsemantics in the document, but to try and make use of theexisting information already present and expose it in a moreexplicit fashion. This is similar to the Deep Annotation ap-proach proposed by Volz et. al., where annotation of a log-ical schema can lead to annotation of Web documents thatare dynamically generated from a database [19].

10

(a) A news story from The CNN Website displayed in a visual manner asthe designer intended. The main content of the page (title, key points andthe article itself) are predominant and easily identified by the user.

(b) A news story from The CNN Website displayed without the visualrendering. The main areas of the page are no longer immediately in focusand the designers intent has been lost to the user.

(c) A news story from The CNN Website after being transcoded by SADIe.The content has been reengineered to allow the user immediate access tothe main content as the designer intended.

Figure 1: Comparison of a CNN News Story as intended by the designer, how users with sequential access interpret the pageand the SADIe Transcoded Version which reengineers the designers intent for audio access. Accessed on 11th August 2008from http://www.cnn.com/2008/WORLD/asiapcf/08/11/pakistan.politics.ap/index.html.

3. Using Semantic Transcoding to ImproveWeb Access For Visually Impaired Users

The principle idea behind our approach to transcoding isthat the rendering of a Web page element is closely asso-ciated with the role designers intended the element to play.For example, sighted users can infer that a list of links isa menu due to the way it is rendered on screen. This ren-dering information is defined within the CSS and associatedwith the XHTML via tag attributes such as class or id. AWebsite typically only has a single CSS file containing allthe style definitions, therefore, rather than annotate everypage, the CSS classes themselves are annotated. This re-duces the annotation overhead as any annotating of the doc-ument only occurs in one location. However, as the Websitecontent is associated with the CSS, the annotations propa-

gate through to all the Web pages contained within the site.The CSS annotations provide us with accurate transcodingas we are explicitly stating the role of each element. Addi-tionally we also gain high scalability as defining the rolesof CSS elements allows every page within the Website tobe transcoded without the requirement for every page to beannotated. This is because CSS tends to contain site-widestyle definitions that all pages use.

The goal of SADIe is to capture the designer’s intendedmeaning of elements on a Website, expressed through thevisual presentation, and reengineer the page so that the samedesign goal is accurately portrayed in a variety of render-ing formats. Currently SADIe aims to improve access toWeb content for visually impaired users. This is achievedby transcoding the page content into a format more suitedto the sequential audio stream generated by a screen reader.

11

SADIe currently provides the user with three transcodingoptions. These are the ability to: Defluff: which involvesremoving elements that provide little or no information tothe page; Reorder: which involves reordering the page sothat the areas that provide the most important informationappear near the top of the page; Menu: which moves themenu of the Website to the bottom of the page where it canbe easily found yet allows the main content to be immedi-ately accessed.

SADIe matches the elements of the Web page to thefunctionality that the user has selected in order to apply thetranscoding. For example, if a user has selected “Defluff”then SADIe will identify all the CSS classes that have beenclassified as Removable through the annotations. When theelements have been identified, SADIe traverses the Webpage’s DOM and removes any element that occurs withinthe list returned by the ontology query. When the DOM hasbeen traversed, the modified page is returned to the user.For further discussion of the SADIe method and architec-ture, the reader is directed to [5, 12].

Figure 1 illustrates how SADIe can be used to rediscovera designer’s intention and transcode a Web page to recre-ate the original meaning for audio output. Figure 1a showsthe main news story from the CNN Website on the 19thJune 2008. The most important content of the page, whichcan be considered to be the story headline, the highlights ofthe story and the article itself, is surrounded by a banner,a search box and advertisements. The designer of the pagehas used a combination of colour, white space and font ef-fects to draw a sighted user’s attention to this area of thepage, leaving the remaining elements on the periphery ofthe page where they do not interfere with the main content.Figure 1b shows the same page with the visual renderingremoved. The designers intent has been lost as the maincontent of the page is no longer immediately obvious, lack-ing any distinctive visual characteristics and being relegatedtowards the bottom of the page. Screen readers have no no-tion of visual rendering and so this is the version of the pagethat visually impaired users interact with when they accessthe CNN news story. The content is read from the top left,to the bottom right of the page in a serial manner, mean-ing that the user has to wait until all the top matter is readaloud before they reach the main content. In Figure 1c, thesame page has been transcoded by SADIe. This page hasbeen “Defluffed”, “Reordered” and “Menued”. Note howthe original intention of the designer has been recreated, buttargeted towards audio output. The story headline, the high-lights of the story and the article itself have been promotedto the top of the page where it can be immediately accessedby a screen reader. The menu has been suppressed to thebottom of the page but is accessible via a link situated nearthe top. This allows users to easily find the menu when theywish to navigate to other news stories.

4. The Initial SADIe OntologyOur approach to annotating Web elements is not inter-

ested in embedding extra metadata or semantics within adocument, but to try and make use of the existing informa-tion already present and expose it in a more explicit fashion.Our original model achieved this through a combination ofan upper and lower ontology.

Figure 2 demonstrates the underlying principle of the on-tology used to annotate the CSS of a Website. The upperlevel ontology contained properties and high level abstractconcepts that represented the potential roles of Web pageelements. These concepts provided a controlled vocabu-lary with which to classify the roles of the Websites thatwe might transcode. The second part of the ontology wasa Website specific extension to the upper ontology. Thiscontained the elements found within the CSS of the Web-site we wished to adapt. The Website ontology accessed theconcepts found in the upper ontology by importing the en-tire upper ontology using a URL. Our reasoning behind thiswas that we could add additional classes to the upper ontol-ogy and these changes would be available to all the Websiteextensions without having to make changes to the model.

The benefit of this approach was that the upper ontologyacted as an interface between SADIe and the page beingtranscoded. The roles of the CSS elements were definedby the abstract classes and properties of the upper ontol-ogy through a site specific extension. During the transcod-ing process, SADIe needed to know the roles that each ele-ment of the Website played. As Websites use heterogeneousnaming policies for the CSS classes, querying a site directlywould have been difficult as we lacked a means to know thenames used in each site’s CSS and the role that the CSS el-ement played. By defining classes in terms of a consistentcollection of concepts, we were provided with an interfaceto a Website, regardless of the names that were used to cre-ate the CSS elements. SADIe merely requested the namesof the CSS elements that fulfilled the role type that was be-ing manipulated, via the terms found in the upper ontology.

The SADIe upper ontology can be seen in Figure 3. Inpractice the implementation was taxonomic with a smallnumber of disjoint concepts being used to act as a classi-fication system of Web elements. For example, a Web pageelement could be either Removable or NonRemovable. ANonRemovable element could be either a Menu or given aPriority and so on and so forth. In addition to the classes theontology contained a series of properties, not only to assignthe Web element roles to the CSS classes but also to expressthe rich structure of a Web page. This was achieved by us-ing a property called hasContainedWithinIt and was usedto indicate which CSS classes were contained within otherCSS classes in the XHTML document. Figure 4 shows afragment of the CNN extension of the SADIe upper ontol-ogy with the hasContainedWithinIt properties highlighted.

12

SADIe

Transcoding

Application

Upper

Ontology

Website

Extensions

Website 1CSS

HTML

Website n CSSHTML

Figure 2: Annotation of Websites Through and Extension of the SADIe Upper Ontology for Each Website that is to beTranscoded. The Upper Ontology Acts As An Interface Between the Website and the Transcoding Engine.

It can be seen that this is a complex structure and showshow the CSS elements are nested within each other withinthe XHTML code. The class marked as “1” is containedwithin two different classes in addition to being deemed Re-movable, although for clarity this is not indicated on the di-agram. We chose to represent this using properties becausewe felt that it would have been difficult to express such astructure using super/sub class relationships alone. Firstly,is-a-kind-of has a slightly different connotation than con-tainment. Secondly, the class highlighted as “1” in Figure 4would need to have three super classes. The full CNN On-tology contained over 200 classes, which would have beenhard to maintain if a significant number of classes had mul-tiple parent classes.

With the containment of the classes defined, a form ofvalidation was then added to the model for the developer.As noted, the class marked “1” in Figure 4 was classifiedas Removable. During transcoding, SADIe identifies theXHTML tag that is associated with a Removable CSS el-ement, removing the node and all its children. As such,any element contained within a removable element that wasdeemed to be HighPriority, would also be removed. Whenreasoning over the ontology to identify CSS element roles,any instances where this may have occurred was flagged andbrought to the attention of the developer. This allowed a sit-uation where important content of the page was removedinadvertently to be avoided.

5. Reflecting Upon The Initial Ontology Design

During usage of SADIe it became apparent that the ini-tial ontological model developed for the system had a num-ber of weaknesses. We now reflect upon those weaknessesand provide key factors that should be taken into consider-ation when developing ontologies for interactive systems.

5.1. Clearly Distinguish Between Semanticsand Functionality

One major flaw of our initial approach was that theSADIe upper ontology confused the roles of Web elementsand the functionality of the tool. The concept Menu, for ex-ample, is a property of a Web page element that describesits role and purpose. The concept Removable, however,does not reflect a semantic property of a Web page ele-ment but describes functionality to be executed on a partic-ular element. A Web element is not inherently removable.In certain instances it can be deemed as superfluous to re-quirements for the user but in other situations it may havesome purpose and need to be retained. The original ontol-ogy made no distinction between the two and merged boththe structural semantics of the Web page and the applicationfunctionality into a single model.

When developing an ontology for an interactive system,designers should consider whether they are modelling theproperties of the domain they wish to interact with or thefunctionality of the application. Our experience suggeststhat the ontology should contain only the properties of theknowledge domain, leaving the functionality to the applica-tion itself. The motivation for such an approach is that theontology has potential usage beyond the application it wasoriginally intended for. For example annotating an elementas a Menu may aid Web content tools other than SADIe.However, annotating elements as Removable may not bebeneficial to tools beyond SADIe as different applicationsmay have different concepts of what is removable and whatis not.

Furthermore, the application itself has more flexibility inhow any output is produced. In the case of SADIe, prede-termining which elements were Removable locked the plat-form into a particular set of transformations that were tar-geted at visually impaired users accessing content with a

13

SADIe

Priority MediumPriority

HighPriority

LowPriority

Menu ConcertinaMenu

MainMenu

SubMenu

ConcertinaOverview

ConcertinaFullDetails

NonRemovable

Removable

WebElement

Figure 3: The Original SADIe Upper Ontology Used to Act as a Classification System of Web Elements.

screen reader. By moving towards a model that containedonly semantics, not only could SADIe produce the originaloutput, but there was scope to generate alternative trans-formations that were not considered during the original de-sign, such as transcoding content for Web enabled mobiledevices.

5.2. Ensure The Domain Is AccuratelyModelled

As discussed in Section 4, SADIe modelled the notionof containment within the CSS elements. However,after deployment it became apparent that this was not anappropriate modelling concept for the ontology. Figure4 shows a fragment of the CNN Website specific exten-sion to the SADIe upper ontology with the containmentproperties highlighted. The class CNN 2ColFooter,marked as “1” in the diagram, is classified as be-ing contained within both CCN homeCenterCol andCNN Float2ColLeft. This implies that in all CNN Webpages, all instances of CNN 2ColFooter are containedwithin CCN homeCenterCol and CNN Float2ColLeft.However this is not generally the case. Whilst someelements may appear on every page of the CNN Website,such as a footer and a header, there are other elements thatmay only appear on some pages. The front page of CNN,for example, has two columns in order to preview a largenumber of short stories for users to read. A page containinga main article, on the other hand, has only one column as asingle story is being described in detail.

In addition, elements that appear on every page may notnecessary be contained within the same XHTML elements.Our premise for the transcoding approach of SADIe wasthat annotating the CSS element allows implicit semanticsof the Website to be exposed. All menus on every CNN

page are marked as being of class cnnHeaderNav, with acorresponding CSS entry. This defines how the class shouldbe rendered on screen, thus giving a consistent look andfeel across every page. However, it does not define howinstances of this class should be added to the XHTML doc-ument. The element can be located at the top of the docu-ment, or near the bottom yet would still look, feel and actlike a menu. Therefore trying to model instances of the el-ement as a rigidly defined structure within the ontologicalmodel was not a suitable approach to take.

When developing an ontology for an interactive system,designers should try to accurately understand the domainthey are modelling before they actively engage in construct-ing the model. As we have seen, failure to understandwhat the ontology was representing resulted in a significantamount of time and effort wasted. With hindsight it mayalso have been worthwhile running small pilot studies to en-sure that the ontology accurately reflected the domain thatit was trying to represent, which in the case of SADIe wasthe structure of Web content. By having an inappropriatemodel, a significant amount of effort was spent in both cre-ating the Website ontologies and then reengineering them totake into account the poor design decisions made at the ini-tial stages of the system. A full understanding of the domainand vigourous testing could have avoided this unnecessaryeffort.

5.3. Consider The Ontology’s Role

Within the SADIe ontology there was a notion of val-idation, whereby a warning could be generated if the rea-soner identified a NonRemovable element contained withina Removable element. During development this may havebeen useful as a method of informing designers that someelements they considered to be important may be lost in

14

cnnOnly

cnn321pxBlock

CNN_2ColFooter

cnn20pxTMargin

CNN_t1

cnnIreportBox

CNN_Float2ColLeft

CNN_homeAdBox

CNN_homeBox

cnnRCPod

cnnT1ImgCaption

CNN_homeCenterCol

CNN_homeLeftCol

hasContainedWithinIt















1

Figure 4: A Fragment of the CNN Website Specific Ontology Showing the Containment Properties

transcoding. However if this warning was ignored, then thereasoner would still raise the validation message after de-ployment as reasoning occurred during every transcodingof the page. As the system became active, it became un-clear as to when the validation errors should be raised and towhom they should be presented to. If the user was informedthen they may have believed the page contained errors andthat they were not receiving a suitably transcoded versionof the page. Alternatively, if the designer was informed ofan error that they had decided was indeed correct, then theymay have been overwhelmed with error messages if a largenumber of users transcoded the page.

From this experience, we believe that it is important toconsider how the ontology will fit into the overall architec-ture of the interactive system. If during use of the systemthe user is expected to understand the ontology and interactwith it, then generating validation warnings and consistencyerrors may be useful. The user can look at such messagesand this may influence how they continue with the interac-tion. However within SADIe, the ontology was part of theback-end architecture that drove the transcoding. The userdid not directly interact with the ontology and therefore itwas an unnecessary feature of the tool to present any mod-elling errors as the users, after deployment, did not see anyof them. In this case it would have been more fruitful forthe application to deal with such inconsistencies in a grace-ful manner. The onus of ensuring that the Website ontologywas correct would have been left to the Website designer.Any modelling errors would have been identified when theWebsite was tested with the transcoder and unexpected out-put was produced.

5.4. Be Pragmatic In Design Decisions

SADIe relied upon Web developers to build ontologies torepresent the implicit structure of their Web content. Web

developers are neither expert ontology builders nor will theyinvest time in Website extensions from which they receivelittle benefit [11]. The design of the SADIe ontology ne-glected the practical construction of the ontology from anumber of perspectives. Firstly the blurring of function-ality and semantics created a model that was only useful tothe SADIe transcoding system. Therefore effort was placedinto building an ontology that was not useful for other sys-tems and limited future functionality of SADIe.

Secondly, we added elements that, with hindsight, hadno benefit to the transcoding outcome of the system. Withthe notion of containment, we attempted to accurately por-tray the Website structure within the ontology. As discussedin section 5.2, this was not the correct approach to take.However, even had we modelled containment correctly, it isnot clear how containment would have benefited the SADIetranscoding. The important aspect, as far as the transcodingengine was concerned, was the ability to identify structuralelement. What the element is contained within was of littlerelevance as the containment model was ultimately brokenwhen the page was reorganised. We were therefore askingdesigners to spend time creating a complex structural modelthat provided little benefit to the system that users interactedwith.

With this in mind one should take a pragmatic approachto designing ontologies, especially when non-experts maybe required to take part in the ontology construction. Onone hand the model needs to be simple enough to captureeverything the system requires while allowing the ontologybuilder to create the model even when they are not experts.Conversely, the ontology needs to avoid being ghettoised sothat it is only useable by the system in its current state. Oneshould consider any future functionality the system mayprovide, or that a third party application may exploit, so thatthe ontology does not require reengineering at a later date.

15

Striking a balance between the two will require value judge-ment and testing. It is difficult to fully comprehend all theaspects of a solution without any real-world usage, there-fore some small scale evaluations with real usage scenarioswould also be of benefit before the final model is deployed.

6. Discussion and Future Work

Having identified a number of issues with the SADIe on-tology architecture, we are currently investigating an alter-native approach based upon the lessons learnt. Our currentproposal involves removing all the properties that were usedto model containment and validation as these appear to beunnecessary. The SADIe transcoder does not require anynotion of containment in order to adapt the Web pages andthe ontology itself is hidden from the user, therefore there isno requirement to inform the user if the model is invalid.

Secondly, the SADIe upper ontology will be teased apartto create a distinction between structural elements and ap-plication specific functionality. To do this the WAfA ontol-ogy will be exploited [13]. This ontology is designed to an-notate Websites for accessibility and mobility terms, how-ever there are a large number of authoring concepts, suchas Links, Chunks and Headings that allow the structure ofthe XHTML semantics to be exposed. By migrating to anontology that only describes the structure of the page, theapplication itself must deal with all the structural elements.In the case of SADIe, this will involve stating within the ap-plication that all Footers are Removable or Menus are NotRemovable rather than query an ontology for classes definedas Removable. It should be noted that a consistent interfaceis still retained between the application and the Website, butmore processing will be placed within the application to tryand deal with all the potential structural elements.

We believe that the exposure of structure using a genericset of terms provides scope for additional applications to bedeveloped that may use the structural elements in a differ-ent way. A news aggregator, for example, may only displaynews stories that have been recently updated. Time stampsof page updates are usually indicted in the footer, so thereforidentifying the footer of the page may guide the applicationin finding the date and then passing the story to the userif it is deemed appropriate to do so. As discussed in sec-tion 5.1, purely structural annotations have also allowed usto add flexibility to the SADIe transcoder that was not con-ceived when the originally prototype was built. Adaptationsthat are targeted at Web enabled mobile devices have beenexperimented with and we have also made steps towardsinserting AJAX into Web documents. AJAX allows Webpages to become interactive and is usually found withinmodern Web 2.0 sites such as online calendars and blogs.We have started to investigate adding consistent key combi-nations to pages so that visually impaired users can always

reach the main content or activate menus with the same keypresses, therefore making interaction easier. This work isstill in its infancy and further research is required, howeverour initial results are encouraging.

By separating the structure and functionality, we believewe can reduce the complexity of exposing the structuralsemantics of Websites. However, to reduce overhead fur-ther, we have also made tentative investigations into re-moving the Website specific ontology and embedding thestructural semantics within the CSS itself. This approachinvolves extending the keywords used within the Cascad-ing Style Sheet yet validity of the CSS can still be main-tained as the specification allows for vendor-specific ex-tensions to be added to the CSS [7]. These extensionsare ignored by user agents that cannot interpret the key-word and therefore cannot apply the appropriate render-ing to the XHTML element. For example, to declarethat a CSS element is a menu, the developer will add-uom-structural-role: LinkMenu to the CSSdefinition. The value LinkMenu is a concept taken fromthe WAfA ontology to show that this class is a list of linksthat represent a menu. It should be noted that there is notight integration between the WAfA ontology and the key-words within the CSS. We merely use the terms within theontology as a reference point for concepts that can be usedto describe structure. Entering a keyword that is not partof the ontology results in the application being unable toutilise the structure of the page. No errors are flagged tothe user as the keyword is merely ignored. This however,is similar to the original approach that used an ontologyto store the names of the CSS elements. The ontologyclasses represented the CSS terms but there was no tightcoupling between the two. The developer was free to en-ter class names that had no equivalent CSS term with theapplication ignoring any class that it was unable to processwithin the XHTML document. Our new approach merelyreverses the process. Rather than represent CSS elementsas a class within an ontology, we now represent WAfA on-tology classes as a CSS element property.

The CSS-based approach, devised from our experienceswith the original two-part ontology is still under investi-gation. As such it is still not clear which approach is themore beneficial. Having a two part ontology, while diffi-cult to create initially, does provide scope for adding richerdescriptions to the Web elements. However, previous expe-rience suggests that describing Web element roles requiresa taxonomy of terms. Embedding these structural termswithin the CSS provides the necessary descriptions our ap-plication requires, but in a more pragmatic fashion that webelieve designers will be more willing to use. However, fur-ther investigation of the approach and discussions with Webdesigners implementing the technique is required before asolid conclusion is made.

16

7. Conclusions

SADIe is a method of explicating the implicit seman-tics of a Web page based on the visual rendering of the ele-ments. This process is achieved by annotating the Cascad-ing Style Sheet that provides the rendering information ofthe page. Exposing the structure allows a transcoding pro-cess to take place that enables the page to suit the audiorendering required for screen reader users. Initially the pro-cess of annotating the CSS elements involved the use of anupper ontology, extended by a site specific lower ontology.While this provided a way of adding rich annotations to theCSS, our approach had a number of weaknesses includinga blurring of application functionality and structural seman-tics and modelling features not required by the transcod-ing tool. Drawing upon these experiences and adopting amore pragmatic approach led to adding a vendor specifickeyword to the CSS. The values that the keyword used de-scribed the structural semantics of the CSS element andwere based upon terms found within the application neu-tral WAfA ontology. This approach imposes less overheadon the annotator yet still provides the flexibility of the orig-inal ontology-based solution. In addition, we believe thestructural semantics will be exposed to more applications,allowing for a more diverse use of the annotations.

References

[1] B. Adida. hgrddl: Bridging microformats and rdfa. WebSemantics: Science, Services and Agents on the World WideWeb, 6(1):54–60, February 2008.

[2] B. Adida and M. Birbeck. Rdfa primer: Embedding struc-tured data in web pages. Technical report, World Wide Webconsortium (W3C), 32 Vassar Street, Room 32-G515, Cam-bridge, MA 02139 USA, March 2008. W3C Working Drafthttp://www.w3.org/TR/xhtml-rdfa-primer/.

[3] C. Asakawa and H. Takagi. Annotation-based transcodingfor nonvisual web access. In Assets ’00: Proceedings ofthe fourth international ACM conference on Assistive tech-nologies, pages 172–179, New York, NY, USA, 2000. ACMPress.

[4] J. Axelsson, M. Birbeck, M. Dubinko, B. Epperson,M. Ishikawa, S. McCarron, A. Navarro, and S. Pember-ton. Xhtml 2.0. Technical report, World Wide Webconsortium (W3C), July 2006. W3C Working Drafthttp://www.w3.org/TR/xhtml2/.

[5] S. Bechhofer, S. Harper, and D. Lunn. Sadie: Semantic an-notation for accessibility. In ISWC ’06: Proceedings of The5th International Semantic Web Conference, pages 101–115,November 2006.

[6] T. Berners-Lee. Rdf in html. Technical report, WorldWide Web consortium (W3C), 32 Vassar Street, Room32-G515, Cambridge, MA 02139 USA, April 2002.http://www.w3.org/2002/04/htmlrdf.

[7] B. Bos, T. Celik, I. Hickson, and H. W. Lie. Cascadingstyle sheets level 2 revision 1 (css 2.1) specification. Can-

didate Recommendation 2.1, W3C, 32 Vassar Street, Room32-G515, Cambridge, MA 02139 USA, July 2007.

[8] D. Connolly. Gleaning resource descriptions from dialectsof languages (grddl). Technical report, World Wide Webconsortium (W3C), 32 Vassar Street, Room 32-G515, Cam-bridge, MA 02139 USA, September 2007. W3C Recom-mendation http://www.w3.org/TR/grddl/.

[9] F. C. Flores, V. Quint, and I. Vatton. Templates, micro-formats and structured editing. In DocEng ’06: Proceed-ings of the 2006 ACM symposium on Document engineering,pages 188–197, New York, NY, USA, 2006. ACM. ISBN:1-59593-515-0.

[10] S. Gupta, G. Kaiser, D. Neistadt, and P. Grimm. Dom-basedcontent extraction of html documents. In WWW ’03: Pro-ceedings of the 12th international conference on World WideWeb, pages 207–214, New York, NY, USA, 2003. ACMPress.

[11] S. Harper and S. Bechhofer. Semantic triage for increasedweb accessibility. IBM Systems Journal, 44(3):637–648,2005.

[12] S. Harper, S. Bechhofer, and D. Lunn. Taming the inacces-sible web. In SIGDOC ’06: Proceedings of the 24th an-nual conference on Design of communication, pages 64–69,2006.

[13] S. Harper and Y. Yesilada. Web authoring for accessibility(wafa). Web Semantics: Science, Services and Agents on theWorld Wide Web, 5(3):175–179, 2007.

[14] M. Hori, G. Kondoh, K. Ono, S. ichi Hirose, and S. Sing-hal. Annotation-based web content transcoding. In Pro-ceedings Of The 9th International World Wide Web Confer-ence On Computer Networks : The International JournalOf Computer And Telecommunications Netowrking, pages197–211, Amsterdam, The Netherlands, The Netherlands,2000. North-Holland Publishing Co.

[15] A. W. Huang and N. Sundaresan. A semantic transcodingsystem to adapt web services for users with disabilities. InAssets ’00: Proceedings of the fourth international ACMconference on Assistive technologies, pages 156–163, NewYork, NY, USA, 2000. ACM Press.

[16] S. C. Ihde, P. Maglio, J. Meyer, and R. Barrett. Intermediary-based transcoding framework. IBM Systems Journal,40(1):179–192, 2001. ISSN: 0018-8670.

[17] N. Kew. Why validate? WWW Validator Mail-ing List Archive http://lists.w3.org/Archives/Public/www-validator/2001JulSep/0597.html, September 2001.

[18] R. Khare and T. Celik. Microformats: A pragmatic path tothe semantic web. In Proceedings of the 15th internationalconference on World Wide Web, pages 865 – 866, ACMPress New York, NY, USA, May 2006. ACM. ISBN:1-59593-323-9.

[19] R. Volz, S. Handschuh, S. Staab, L. Stojanovic, and N. Sto-janovic. Unveiling the hidden bride: deep annotation formapping and migrating legacy data to the semantic web. WebSemantics: Science, Services and Agents on the World WideWeb, 1(2):187–206, February 2004.

[20] E. Wilde. Describing namespaces with grddl. In WWW’05: Special interest tracks and posters of the 14th inter-national conference on World Wide Web, pages 1002–1003,New York, NY, USA, 2005. ACM.

17

Date post:	29-Nov-2023
Category:	Documents
Upload:	independent
View:	0 times
Download:	0 times

Experiences of Exposing Semantics to Drive Transcoding

Documents