Date post: | 21-Mar-2017 |
Category: |
Data & Analytics |
Upload: | simon-price |
View: | 14 times |
Download: | 0 times |
A review of the state of the art in Machine Learning on the Semantic Web
Simon PriceUniversity of Bristolhttp://www.cs.bris.ac.uk/~price
Outline
• Introduction to the Semantic Web– Semantic Web layers– URI, RDF(S), OWL– Web Services and the Semantic Web
• Applications of Machine Learning– Creating the Semantic Web– Using the Semantic Web
• Summary and pointers to further info.
Introduction to the Semantic Web
Definition
"The Semantic Web is the representation of data on the World Wide Web. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming."
Uniform Resource Identifier (URI)
• URI addressing schemehttp://...ftp://...mailto:..., etc.
• Each URI points to a resource (or a specific point within a resource)
• Typically, the resource is somewhere on the Web but it may be a non-network retrievable entitye.g.
- human beings, corporations, bound books in a library, - concepts, topics, relations, ...
URIs - Good news. Bad news.
• Good news: decentralisation– anyone can create a URI– allows rapid growth of Web
• Bad news: decentralisation– no centralised register or clearing house– multiple URIs can refer to same entity– testing for equality (or equivalence) poses interesting problems
Resource Description Framework (RDF)
• A language of URI triples.• An RDF statement has the form:
{ subject, predicate, object }
• e.g. "http://www.example.org/index.html has a creator whose value is the literal John Smith" could be represented as a plain text triple:
subject http://www.example.org/index.htmlpredicate http://purl.org/dc/elements/1.1/creatorobject John Smith
Representing RDF
• Default syntax is XML (not human friendly)
• SQL triple stores commonly used
• RDF toolkits: Jena (HP) and Redland (Dave Beckett)
• Prolog: SWI-Prolog (40M triples per 100MB RAM)e.g.
rdf( 'http://www.example.org/index.html', 'http://purl.org/dc/elements/1.1/creator', 'John Smith' ).
Semantic Web Layers
RDF Schema
• A language for describing properties and classes of RDF resources
• Includes semantics for generalisation-hierarchies of such properties and classes
• Simple data typing model:– is-a relationships and properties– some range and domain restriction
Notes: 1. RDF Schema recently renamed as "RDF Vocabulary Description Layer" 2. In the literature, RDF + RDF Schema is often referred to as RDF(S)
Ontology Vocabulary Layer
• Huge number of different ontologies:– simple: thesauri, taxonomies
– complex: DAML+OIL, OWL
• OWL supersedes the older DAML+OIL
• OWL goes further than RDF Schema, adding:– relations between classes– cardinality– equality– richer typing– characteristics of properties– enumerated classes
Web Ontology Language (OWL)
• OWL Lite - hierarchical classification (ideal for thesauri and other taxonomies).
• OWL DL - description logics (computationally complete but inference services are restricted to classification and subsumption).
• OWL Full - full syntactic freedom of RDF (no computational guarantees).
Web Services and the Semantic Web
• Web Services– XML-based interfaces to programs accessible via the Web– Operating system neutral Remote Procedure Call (RPC) protocol
• Today's Web Services– Business-orientated, simple, short transactional operations– Domain-specific XML vocabularies (not RDF)
• Tomorrow's Web Services– Combination of simple services to achieve complex operations– Automated discovery, selection and pipelining of Web Services
• Semantic Web + Machine Learning may have an important role to play in the future of Web Services
Applications of Machine Learning
• Attempts to apply Machine Learning are being made within each of the Semantic Web layers.
• Research activity within each layer can be divided into two parts:
The application of Machine Learning in:• creating the Semantic Web• using the Semantic Web
• Most activity to-date is in creating the Semantic Web
Activity
Creating the Semantic Web
• Why can't people do this themselves?– People are frequently unaware of metadata standards– People are (usually) unwilling to spend time creating metadata
• May be no direct benefit (to them)• Boring
– People are incapable of applying metadata consistently• Consistency varies from person to person• Consistency varies in the same person over time
– There's already a huge backlog of unlabelled data on the existing web!– Also, someone else's metadata may not be what you want
• e.g. Site content rating from supplier may be unreliable
Automatic Generation of Metadata
• Paper describes examples of ML research that use:
– Inductive Logic Programming (on popular science articles)• F-Score close to human expert. Precision between 0.7 and 1.0
– Hidden Markov Models (on marked-up MUC and MEDLINE texts)• Reported as adequate but not able to scale due to fragmentation of
probability distribution. Portable across domains. SVMs suggested.
– Association Analysis (using Web Directory for labelling examples)• Work in progress but looks for terms in text that indicate directory path
e.g. of a path .../Manufacturing/Materials/Metals/Steel/..
Application of ML to Ontologies
• Ontology Vocabulary Layer is currently a popular area of Semantic Web research
• Most ontologies hand-crafted• Creating ontologies is far more complex than RDF
metadata extraction– ILP has been used to revise and maintain, but not create– Association rule learning has been used to partly automate– Regular expression (FSA) rewriting guided by Minimum Description
Length to create Document Type Descriptors (DTD) for XML docs.
• Ontology mapping– Hard problem– Some work using Naive Bayes
Using the Semantic Web
• Not much ML research in this area (yet)• Datasets exists
– RSS newsfeeds and Weblogs/Blogs– DAML repository– Dave Beckett's RDF Resource Guide
• Locating suitable data can be a problem
• Semantic Web Mining has been conjectured– combines Semantic Web with Web Mining– Relational Data Mining (RDM) suggested to exploit structure in data
Summary
• Semantic Web is rapidly evolving• Key languages:
– RDF– vocabularies built on top of RDF
• Publicly available RDF datasets exist– in applications like RSS– and repositories like DAML
• RDF maps well to Prolog (and SQL)• Machine Learning looks promising for both the
creation and use of the Semantic Web
Simon PriceUniversity of Bristolhttp://www.cs.bris.ac.uk/~price