Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 0 times |
Exploiting large scale web semantics to build end user applications
Enrico MottaProfessor of Knowledge Technologies
Knowledge Media InstituteThe Open University
Aims of the Talk
• What is the Semantic Web– Perspectives
• The SW as a ‘web of data’• The SW as a new context in which to build semantic
applications and an unprecedented opportunity in which to address some classic AI problems
– Typical misconceptions• What the SW is not!
• Semantic Web for Users– Applications that do something interesting and useful
to users, by exploiting available web semantics
The Semantic Web as a ‘Web of Data’
Making data available to SW-aware software
<foaf:Person rdf:about="http://identifiers.kmi.open.ac.uk/people/enrico-motta/">
<foaf:name>Enrico Motta</foaf:name> <foaf:firstName>Enrico</foaf:firstName> <foaf:surname>Motta</foaf:surname> <foaf:phone rdf:resource="tel:+44-(0)1908-653506"/> <foaf:homepage rdf:resource="http://kmi.open.ac.uk/people/motta/"/> <foaf:workplaceHomepage rdf:resource="http://kmi.open.ac.uk/"/> <foaf:depiction rdf:resource="http://kmi.open.ac.uk/img/members/enrico.jpg"/> <foaf:topic_interest>Knowledge Technologies</foaf:topic_interest> <foaf:topic_interest>Semantic Web</foaf:topic_interest> <foaf:topic_interest>Ontologies</foaf:topic_interest> <foaf:topic_interest>Problem Solving Methods</foaf:topic_interest> <foaf:topic_interest>Knowledge Modelling</foaf:topic_interest> <foaf:topic_interest>Knowledge Management</foaf:topic_interest> <foaf:based_near> <geo:Point> <geo:lat>52.024868</geo:lat> <geo:long>-0.707143</geo:long>
<contact:nearestAirport> <airport:name>London Luton Airport</airport:name> <airport:iataCode>LTN</airport:iataCode> <airport:location>Luton, United Kingdom</airport:location> <geo:lat>51.866666666667</geo:lat> <geo:long>-0.36666666666667</geo:long> <rdfs:seeAlso rdf:resource="http://www.daml.org/cgi-bin/airport?LTN"/> <foaf:currentProject>
<foaf:Project><foaf:name>AquaLog</foaf:name> </foaf:currentProject>
The web of SW documents
Current status of the semantic web
• 10-20 million semantic web documents– Expressed in RDF, OWL, DAML+OIL
• 7K-10K ontologies– These cover a variety of domains - music, multimedia, computing, management, bio-medical sciences, upper level concepts, etc…
• Hence:– To a significant extent the semantic web is already in place
– However, domain coverage is very uneven
– Still primarily a research enterprise, however interest is rapidly increasing in both governmental and business organizations• “early adopters” phase
The above figures refer to resources which are publicly accessible on the web
<data data data><data data data>
<data data data><da
ta d
ata
data
>
<data data data>
<data data data>
<rdf:Description rdf:about="http:/ /ww w.ecs.soton.ac.uk/info/#person-01269"> <ns0:family-name>Gibbins</ns0:family-name> <ns0:full-name>Nicholas Gibbins</ns0:full-name> <ns0:given-name>Nicholas</ns0:given-name> <ns0:has-email-address>[email protected]</ns0:has-email-address> <ns0:has-affiliation-to-unit rdf:resource="http:// 194.66.183.26/ WEBSITE/GOW/Vie wDepartment.aspx?Department=750"/> </ rdf:Description> </ rdf:RDF>
CS Dept Data
AKT Reference Ontology
RDF Data
Bibliographic Data
Geography
• A ‘corporate ontology’ is used to provide a homogeneous view over heterogeneous data sources.
• Often tackle Enterprise Information Integration scenarios
• Hailed by Gartner as one of the key emerging strategic technology trends– E.g., Garlik is a multi-million startup recently set up in UK to support
personal information management, which uses an ontology to integrate data mined from the web on a large scale
“Corporate Semantic Webs”
AquaLog
Applications that exploit large scale semantic content
The web of data
Gateways to the SW
ApplicationSemantic
Web
• Sophisticated quality control mechanism– Detects duplications– Fixes obvious syntax problems
• E.g., duplicated ontology IDs, namespaces, etc..
• Structures ontologies in a network– Using relations such as: extends, inconsistentWith, duplicates
• Provides interfaces for both human users and software programs
• Provides efficient API• Supports formal queries (SPARQL)• Variety of ontology ranking mechanisms• Modularization/Combination support• Plug-ins for Protégé and NeOn Toolkit • Very cool logo!
Case Study 1: Automatic Alignment of Thesauri in the Agricultural/Fishery Domain
Method
Concept_A
(e.g., Supermarket)
Concept_B
(e.g., Building)
ScarletScarlet≡≡
Semantic Web
Semantic Relation
( )
Deduce
Access
⊆
- SCARLET - matching by Harvesting the SW
- Automatically select and combine multiple online ontologies to derive a relation
Two strategies
Supermarket Building
Supermarket
Shop
⊆
⊆
PublicBuilding⊆
⊆Building
ScarletScarlet
Cholesterol OrganicChemical
Cholesterol
Steroid
⊆
⊆
Lipid⊆
⊆OrganicChemical
ScarletScarlet
Steroid
≡
≡≡ ≡ ≡
Deriving relations from (A) one ontology and (B) across ontologies.
Semantic Web
(A) (B)
Matching:• AGROVOC
•UN’s Food and Agriculture Organisation (FAO) thesaurus •28.174 descriptor terms•10.028 non-descriptor terms
• NALT•US National Agricultural Library Thesaurus•41.577 descriptor terms•24.525 non-descriptor terms
Experiment
226 Used Ontologies
http://139.91.183.30:9090/RDF/VRP/Examples/tap.rdf
http://reliant.teknowledge.com/DAML/SUMO.daml
http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml
http://reliant.teknowledge.com/DAML/Economy.damlhttp://gate.ac.uk/projects/htechsight/Technologies.daml
Evaluation 1 - Precision
• Manual assessment of 1000 mappings (15%)• Evaluators:
– Researchers in the area of the Semantic Web– 6 people split in two groups
• Results:– Comparable to best results for background
knowledge based matchers.
Evaluation 2 – Error Analysis
Case Study 2:Folksonomy Tagspace Enrichment
• Tagging as opposed to rigid classification
• Dynamic vocabulary does not require much annotation effort and evolves easily
• Shared vocabulary emerge over time – certain tags become
particularly popular
Features of Web2.0 sites
Limitations of tagging
• Different granularity of tagging– rome vs colosseum vs roman monument– Flower vs tulip– Etc..
• Multilinguality• Spelling errors, different terminology, plural vs
singular, etc…
• This has a number of negative implications for the effective use of tagged resources– e.g., Search exhibits very poor recall
Giving meaning to tags
1. Mapping a tag to a SW element "japan"
<akt:Country Japan>
What does it mean to add semantics to tags?
2. Linking two "SW tags" using semantic relations
{japan, asia} <japan subRegionOf asia>
Applications of the approach
• To improve recall in keyword search
• To support annotation by dynamically suggesting relevant tags or visualizing the structure of relevant tags
• To enable formal queries over a space of tags– Hence, going beyond keyword search
• To support new forms of intelligent navigation– i.e., using the 'semantic layer' to support navigation
Concept and relation identification
No
END
Remainingtags?
Clustering
Folksonomy
Cluster tags
Cluster1 Cluster2 Clustern…
2 “related” tags
Find mappings & relation for pair of tags
Yes
Analyze co-occurrence of tags
Co-occurence matrix
Pre-processing
Tags
Group similar tags
Filter infrequent tags
Concise tags
Clean tags
Wikipedia
SW search engine
<concept, relation, concept>
participant
innovation
event
developer
activity
creatorplanning example
applica-tion
user
admin
resource
typeRange component
interface
partici-patesIn
in-eventarchive
Information Object
has-mention-of
Examples
Cluster_1: {admin application archive collection component control developer dom example form innovation interface layout planning program repository resource sourcecode}
Examples
Cluster_2: {college commerce corporate course education high instructing learn learning lms school student}
education
training1,4 qualification
corporate1 institution
university2,3 college2
postSecondarySchool2
school2
student3 studiesAt
course3
offersCoursetakesCourse
activities4
learning4 teaching4
1http://gate.ac.uk/projects/htechsight/Employment.daml.2http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml. 3http://www.mondeca.com/owl/moses/ita.owl.4http://www.cs.utexas.edu/users/mfkb/RKF/tree/CLib-core-office.owl.
Faceted Ontology
• Ontology creation and maintenance is automated
• Ontology evolution is driven by task features and by user changes
• Large scale integration of ontology elements from massively distributed online ontologies
• Very different from traditional top-down-designed ontologies
Case Study 3:Reviewing and Rating on the Web
Revyu.com
expertise the source has relevant expertise of the domain of the recommendation-seeking; this may be formally validated through qualifications or acquired over time.
experience the source has experience of solving similar scenarios in this domain, but without extensive expertise.
impartiality the source does not have vested interests in a particular resolution to the scenario.
affinity the source has characteristics in common with the recommendation seeker, such as shared tastes, standards, values, viewpoints, interests, or expectations.
track record the source has previously provided successful recommendations to the recommendation seeker.
Trust Factors
subjective
affinity expertiseexperience
objectivesolution
factorsemphasised
Applying the framework to revyu.com
• Affinity– Operationalised as the degree of overlap in items
reviewed, and in ratings given
• Experience– Proxy metric: Usage of particular tags (as proxies for
topics)• Experience scores based on tagging data• Integrates also data from del.icio.us for those users
who have chosen to publish their del.icio.us account on FOAF
• Expertise– Proxy metric: Credibility– Captures the social aspect of expertise: endorsement
Using trust factors for ranking reviews
PowerAqua and PowerMagpie
How does the Semantic Web relate to Artificial Intelligence research?
AI as Heuristic Search
The knowledge-based paradigm in AI
“Today there has been a shift in paradigm. The fundamental problem of understanding intelligence is not the identification of a few powerful techniques, but rather the question of how to represent large amounts of knowledge in a fashion that permits their effective use”
Goldstein and Papert,1977
Knowledge Representation Hypothesis in AI
Any mechanically embodied intelligent process will be comprised of structural ingredients that we as external observers naturally take to represent a propositional account of the knowledge that the overall process exhibits, and independent of such external semantic attribution, play a formal but causal and essential role in engendering the behaviour that manifests that knowledge
Brian Smith, 1982
Knowledge-Based Systems
Large Bodyof Knowledge
Intelligent Behaviour
The Knowledge Acquisition Bottleneck
Large Bodyof Knowledge
Intelligent Behaviour
KA Bottleneck
Knowledge
The Cyc project
Problem SolvingMethodGeneric Task
Domain Model
MappingKnowledge
Application-specificProblem-Solving Knowledge
Application Configuration
Parametric Design Library of PSMs
Mapping Ontology Ontology
Structured libraries of reusable components
Classification
Scheduling
Etc…
The next knowledge medium
“An information network with semi-automated services for the generation, distribution, and consumption of knowledge”
• However, our approach based on structured libraries of problem solving components only addressed the economic cost of KBS development…
SW as Enabler of Intelligent Behaviour
Intelligent Behaviour
Both a platform for knowledge publishing and a large scale source of knowledge
KBS vs SW Systems
Classic KBS SW Systems
Provenance Centralized Distributed
Size Small/Medium Extra Huge
Repr. Schema Homogeneous Heterogeneous
Quality High Very Variable
Degree of trust High Very Variable
Key Paradigm Shift
Classic KBS SW Systems
Intelligence A function of sophisticated, logical, task-centric problem solving
A side-effect of being able to integrate different types of reasoning to handle size and heterogeneous quality and representation
Conclusions
Typical misconceptions…
• “The SW is a long-term vision…”– Ehm…actually… it already exists…
• “The SW will never work because nobody is going to annotate their web pages”– The SW is not about annotating web pages, the SW is
a web of data, most of which are generated from DBs, or from web mining software, or from applications which produce SW technology
• “The idea of a universal ontology has failed before and will fail again. Hence the SW is doomed”– The SW is not about a single universal ontology.
Already there are around 10K ontologies and the number is growing…
– SW applications may use 1, 2, 3, or even hundreds of ontologies.
Large Scale Distributed Semantics
• Widespread production of formalised knowledge models (ontologies and metadata), from a variety of different groups and individuals– E.g., legal, bio-medical, governmental, environmental, music, art, multimedia,
computing, etc..– “Knowledge modelling to become a new form of literacy?”
• Stutt and Motta, 1997
• This large scale heterogenous resource will enable a new generation of semantic-aware technologies
• These developments may provide a new context in which to address the economic barriers to KBS development
• The SW already exists to some extent, however there is still a way to go, before it will reach the required degree of maturity
Large Scale Distributed Semantics
• Much like AI, the semantic web will only succeed if it becomes ubiquitous and hidden
“There's this stupid myth out there that A.I. has failed, but A.I. is everywhere around you every second of the day. People just don't notice it. You've got A.I. systems in cars, tuning the parameters of the fuel injection systems. When you land in an airplane, your gate gets chosen by an A.I. scheduling system. Every time you use a piece of Microsoft software, you've got an A.I. system trying to figure out what you're doing, like writing a letter, and it does a pretty damned good job. Every time you see a movie with computer-generated characters, they're all little A.I. characters behaving as a group. Every time you play a video game, you're playing against an A.I. system.”
Rodney Brooks