Date post: | 16-Dec-2015 |
Category: |
Documents |
Upload: | justin-eade |
View: | 221 times |
Download: | 0 times |
Growing the Growing the Semantic WebSemantic WebBy Charla WoodburyBy Charla Woodbury
June 11, 2004June 11, 2004
INTERNET to INTERNET to SEMANTIC WEBSEMANTIC WEB
The present internet is too large to conduct specific The present internet is too large to conduct specific searches in its present formatsearches in its present format
The Semantic Web holds the promise of a much richer The Semantic Web holds the promise of a much richer and easily searchable information resourceand easily searchable information resource
Most current research targets small areas of Most current research targets small areas of development of the Semantic Web rather than looking development of the Semantic Web rather than looking at the whole process and showing its advantagesat the whole process and showing its advantages
What is needed is a working example of the Semantic What is needed is a working example of the Semantic Web that demonstrates the advantages and minimizes Web that demonstrates the advantages and minimizes the problems to be able to start growing webpages for the problems to be able to start growing webpages for the Semantic Webthe Semantic Web
High-volume Information High-volume Information Publishers should be the Publishers should be the first TARGETfirst TARGET
The old adage is to deal with the new The old adage is to deal with the new water coming in rather than changing the water coming in rather than changing the water already in the lake if you want to water already in the lake if you want to change the lake’s water in any waychange the lake’s water in any way
By starting with high-volume information By starting with high-volume information publishers, the nature of the internet lake publishers, the nature of the internet lake would change very quicklywould change very quickly
EmbeddedObituaryOntology
Obituary PrototypeObituary Prototype
Newspaper Publisher
Obituary vocabulary
Word Net
Daily News
obituaries
Daily News
HOME PAGEObituary vocabulary
Once the faucet is turned on Once the faucet is turned on the population pool of the population pool of Semantic Webpages would Semantic Webpages would grow very quicklygrow very quickly
Thesis StatementThesis Statement
The cost/benefit analysis of populating the The cost/benefit analysis of populating the Semantic Web by building an embedded Semantic Web by building an embedded OWL ontology and the corresponding OWL ontology and the corresponding specialized vocabulary on top of specialized vocabulary on top of WordNet for EACH information publisher WordNet for EACH information publisher using an obituary prototype is practical using an obituary prototype is practical and cost effective.and cost effective.
ADVANTAGESADVANTAGES
Each information publisher Each information publisher
The ontology is only built once and used many timesThe ontology is only built once and used many times
The specialized vocabulary is only built once and accessed The specialized vocabulary is only built once and accessed many timesmany times
The ontology and vocabulary belong to the publisher who can The ontology and vocabulary belong to the publisher who can change them as the format and vocabulary of the obituaries change them as the format and vocabulary of the obituaries they produce change (deletion discouraged)they produce change (deletion discouraged)
Most of the cost would be incurred in setting up the ontology Most of the cost would be incurred in setting up the ontology and the specialized vocabularyand the specialized vocabulary
ADVANTAGESADVANTAGES
Information extraction would be done without Information extraction would be done without contacting the publisher other than an agentcontacting the publisher other than an agent
There would be no need to index the information once There would be no need to index the information once the information retrieval portion was in placethe information retrieval portion was in place
HTML information is easy to store and maintainHTML information is easy to store and maintain
HTML files are much smaller than digitized microfilm HTML files are much smaller than digitized microfilm presently usedpresently used
METHODS METHODS Each NewspaperEach Newspaper
Contact selected newspapers to produce semantic obituary webpagesContact selected newspapers to produce semantic obituary webpages Learn how they archive the HTML version of the newspaperLearn how they archive the HTML version of the newspaper
Get estimates on the cost to the newspaper to index, microfilm, and store Get estimates on the cost to the newspaper to index, microfilm, and store their archivestheir archives
Request a reporter in obituaries to list specialized vocabulary and build the Request a reporter in obituaries to list specialized vocabulary and build the vocabulary and OWL ontology to be embeddedvocabulary and OWL ontology to be embedded
Train a newspaper employee to test and edit the ontology and vocabularyTrain a newspaper employee to test and edit the ontology and vocabulary
Test that vocabulary and ontology to make sure that it is sufficiently Test that vocabulary and ontology to make sure that it is sufficiently inclusiveinclusive
Compare the time needed to build the first newspaper with the subsequent Compare the time needed to build the first newspaper with the subsequent onesones
METHODSMETHODSOrganizations using Organizations using Obituary informationObituary information Contact Family History businesses, Genealogical Contact Family History businesses, Genealogical
societies, and Government agencies that would use societies, and Government agencies that would use obituary informationobituary information Find out how they get their obituary information now and how Find out how they get their obituary information now and how
much that costs in time and moneymuch that costs in time and money
Measure their future interest in using agents to retrieve obituary Measure their future interest in using agents to retrieve obituary information insteadinformation instead
Discover what parts of the obituary information they consider Discover what parts of the obituary information they consider minimal to their work and what information would be desired and minimal to their work and what information would be desired and optimaloptimal
Present the results of obituary prototype and re-measure their Present the results of obituary prototype and re-measure their future interest in using agents to retrieve obituary informaitonfuture interest in using agents to retrieve obituary informaiton
PROBLEMSPROBLEMS
The first problem is how to entice publishers to The first problem is how to entice publishers to start the processstart the process
The basic problem is a semantic one? How The basic problem is a semantic one? How will regional burial practices and language will regional burial practices and language differences impact the process?differences impact the process?
But the biggest problem is how to maintain the But the biggest problem is how to maintain the ontology and vocabulary with the least amount ontology and vocabulary with the least amount of human interventionof human intervention
First ProblemFirst ProblemHow to entice publishers to start How to entice publishers to start the process of making semantic the process of making semantic webpages?webpages? Find Grants, Research Money, and/or money from Find Grants, Research Money, and/or money from
Corporate sponsorship by those companies that would Corporate sponsorship by those companies that would profit from the informationprofit from the information
Petition for Government SupportPetition for Government Support Office of Internet Semantic Information (i.e. Library of Office of Internet Semantic Information (i.e. Library of
Congress)Congress) Demonstrate by prototype - ObituariesDemonstrate by prototype - Obituaries
Process works well (Electric lights in large cities)Process works well (Electric lights in large cities) Specific information is far more easily foundSpecific information is far more easily found Their information is more availableTheir information is more available The maintenance process is minimalThe maintenance process is minimal The rewards are maximalThe rewards are maximal Everyone else is doing itEveryone else is doing it
SECOND PROBLEMSECOND PROBLEMThe basic problem is a semantic The basic problem is a semantic one? How will regional burial one? How will regional burial practices and language differences practices and language differences impact the process?impact the process?
The basic format of the specialized vocabulary would be the same The basic format of the specialized vocabulary would be the same as WordNet with rich word relationships (i.e. interred – interment – as WordNet with rich word relationships (i.e. interred – interment – buried – burial as homonyms)buried – burial as homonyms)
Regional and language differences would be expressed in adding Regional and language differences would be expressed in adding rich vocabulary as deemed necessary by the individual publisherrich vocabulary as deemed necessary by the individual publisher
Fine-tune and test the vocabulary and the ontologyFine-tune and test the vocabulary and the ontology
Teach the computer to speak obituary languageTeach the computer to speak obituary language
THIRD PROBLEMTHIRD PROBLEMHow to simplify and automate the How to simplify and automate the testing and maintenance of the testing and maintenance of the ontology and vocabulary?ontology and vocabulary?
TESTING and SIMPLE MAINTENANCETESTING and SIMPLE MAINTENANCE Install a tool for creating and editing an OWL ontology as Install a tool for creating and editing an OWL ontology as
automated as possibleautomated as possible
Set up procedures for how often to test the ontology (i.e. new Set up procedures for how often to test the ontology (i.e. new reporter, new obituary template, a set length of time)reporter, new obituary template, a set length of time)
Write program that tests how effective the ontology is and lists Write program that tests how effective the ontology is and lists words in the obituaries that are not in the vocabulary for review words in the obituaries that are not in the vocabulary for review and addition to the vocabularyand addition to the vocabulary
Teach the machine to add those words automatically to the Teach the machine to add those words automatically to the vocabulary if possiblevocabulary if possible
EvaluationEvaluation
Cost/benefit analysis in time and money Cost/benefit analysis in time and money between the original process and the new between the original process and the new Semantic Web processSemantic Web process
Survey those testing and maintaining the Survey those testing and maintaining the Semantic Webpages about the process and Semantic Webpages about the process and the tools providedthe tools provided
Compare Survey given to possible information Compare Survey given to possible information retrievers before and after demonstration of the retrievers before and after demonstration of the obituary prototypeobituary prototype
CONTRIBUTIONSCONTRIBUTIONS
A working model of the Semantic WebA working model of the Semantic Web
A growing pool of semantic webpages for future A growing pool of semantic webpages for future information extraction & retrievalinformation extraction & retrieval
As new standards emerge, adjustments in the process As new standards emerge, adjustments in the process could be made immediately and only once for everyonecould be made immediately and only once for everyone
A replacement for the cost of human indexing the A replacement for the cost of human indexing the informationinformation
Future WorkFuture WorkHow will agents interpret many How will agents interpret many different obituary ontologies and different obituary ontologies and vocabularies?vocabularies?
NewspaperPublisherNewspaper
PublisherNewspaperPublisherNewspaper
PublisherNewspaperPublishers
EmbeddedObituaryOntology
Daily News
obituaries
EmbeddedObituaryOntology
Daily News
obituaries
EmbeddedObituaryOntology
Daily News
obituaries
EmbeddedObituaryOntology
Daily News
obituaries
EmbeddedObituary
Ontologies
Daily News
obituaries
Obituary vocabularyObituary
vocabularyObituary vocabularyObituary
vocabularyObituary vocabularies
Future WorkFuture WorkShould there be one global Should there be one global obituary ontology and/or one obituary ontology and/or one global burial vocabulary? (All global burial vocabulary? (All languages and burial practices)languages and burial practices)
GLOBALObituaryOntology
Future WorkFuture WorkOr will the agent be smart enough Or will the agent be smart enough to traverse the associated to traverse the associated vocabulary for the correct vocabulary for the correct information?information?
Obituary vocabularyObituary
vocabularyObituary vocabularyObituary
vocabularyObituary vocabularies
AGENT
Future WorkFuture WorkHow will the agents deliver the How will the agents deliver the obituary extracted information?obituary extracted information?
Obituary ExtractedDatabase
Daily News || 26 Jan 2004 || Charles Lambert || b. 12 June 1911 || d. 24 Jan 2004
HTML REPORTAll Obituaries with surname LAMBERT
URL’s to the actual Newspaper ObituariesCharles Lambert d. 24 Jan 2004Richard Greaves Lambert d. 17 Oct 2003
EmbeddedObituaryOntology
Daily News
obituaries
Future WorkFuture Work
Will it be necessary to hire and pay obituary Will it be necessary to hire and pay obituary indexers?indexers?
Will the newspapers continue to be microfilmed Will the newspapers continue to be microfilmed or just stored in HTML? Will storage space be or just stored in HTML? Will storage space be an issue?an issue?
Will the whole process including information Will the whole process including information retrieval be cost effective?retrieval be cost effective?
QUESTIONS?QUESTIONS?
COMMENTS?COMMENTS?