+ All Categories
Home > Data & Analytics > A review of the state of the art in Machine Learning on the Semantic Web

A review of the state of the art in Machine Learning on the Semantic Web

Date post: 21-Mar-2017
Category:
Upload: simon-price
View: 14 times
Download: 0 times
Share this document with a friend
21
A review of the state of the art in Machine Learning on the Semantic Web Simon Price University of Bristol http://www.cs.bris.ac.uk/~price
Transcript
Page 1: A review of the state of the art in Machine Learning on the Semantic Web

A review of the state of the art in Machine Learning on the Semantic Web

Simon PriceUniversity of Bristolhttp://www.cs.bris.ac.uk/~price

Page 2: A review of the state of the art in Machine Learning on the Semantic Web

Outline

• Introduction to the Semantic Web– Semantic Web layers– URI, RDF(S), OWL– Web Services and the Semantic Web

• Applications of Machine Learning– Creating the Semantic Web– Using the Semantic Web

• Summary and pointers to further info.

Page 3: A review of the state of the art in Machine Learning on the Semantic Web

Introduction to the Semantic Web

Page 4: A review of the state of the art in Machine Learning on the Semantic Web

Definition

"The Semantic Web is the representation of data on the World Wide Web. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming."

Page 5: A review of the state of the art in Machine Learning on the Semantic Web

Uniform Resource Identifier (URI)

• URI addressing schemehttp://...ftp://...mailto:..., etc.

• Each URI points to a resource (or a specific point within a resource)

• Typically, the resource is somewhere on the Web but it may be a non-network retrievable entitye.g.

- human beings, corporations, bound books in a library, - concepts, topics, relations, ...

Page 6: A review of the state of the art in Machine Learning on the Semantic Web

URIs - Good news. Bad news.

• Good news: decentralisation– anyone can create a URI– allows rapid growth of Web

• Bad news: decentralisation– no centralised register or clearing house– multiple URIs can refer to same entity– testing for equality (or equivalence) poses interesting problems

Page 7: A review of the state of the art in Machine Learning on the Semantic Web

Resource Description Framework (RDF)

• A language of URI triples.• An RDF statement has the form:

{ subject, predicate, object }

• e.g. "http://www.example.org/index.html has a creator whose value is the literal John Smith" could be represented as a plain text triple:

subject http://www.example.org/index.htmlpredicate http://purl.org/dc/elements/1.1/creatorobject John Smith

Page 8: A review of the state of the art in Machine Learning on the Semantic Web

Representing RDF

• Default syntax is XML (not human friendly)

• SQL triple stores commonly used

• RDF toolkits: Jena (HP) and Redland (Dave Beckett)

• Prolog: SWI-Prolog (40M triples per 100MB RAM)e.g.

rdf( 'http://www.example.org/index.html', 'http://purl.org/dc/elements/1.1/creator', 'John Smith' ).

Page 9: A review of the state of the art in Machine Learning on the Semantic Web

Semantic Web Layers

Page 10: A review of the state of the art in Machine Learning on the Semantic Web

RDF Schema

• A language for describing properties and classes of RDF resources

• Includes semantics for generalisation-hierarchies of such properties and classes

• Simple data typing model:– is-a relationships and properties– some range and domain restriction

Notes: 1. RDF Schema recently renamed as "RDF Vocabulary Description Layer" 2. In the literature, RDF + RDF Schema is often referred to as RDF(S)

Page 11: A review of the state of the art in Machine Learning on the Semantic Web

Ontology Vocabulary Layer

• Huge number of different ontologies:– simple: thesauri, taxonomies

– complex: DAML+OIL, OWL

• OWL supersedes the older DAML+OIL

• OWL goes further than RDF Schema, adding:– relations between classes– cardinality– equality– richer typing– characteristics of properties– enumerated classes

Page 12: A review of the state of the art in Machine Learning on the Semantic Web

Web Ontology Language (OWL)

• OWL Lite - hierarchical classification (ideal for thesauri and other taxonomies).

• OWL DL - description logics (computationally complete but inference services are restricted to classification and subsumption).

• OWL Full - full syntactic freedom of RDF (no computational guarantees).

Page 13: A review of the state of the art in Machine Learning on the Semantic Web

Web Services and the Semantic Web

• Web Services– XML-based interfaces to programs accessible via the Web– Operating system neutral Remote Procedure Call (RPC) protocol

• Today's Web Services– Business-orientated, simple, short transactional operations– Domain-specific XML vocabularies (not RDF)

• Tomorrow's Web Services– Combination of simple services to achieve complex operations– Automated discovery, selection and pipelining of Web Services

• Semantic Web + Machine Learning may have an important role to play in the future of Web Services

Page 14: A review of the state of the art in Machine Learning on the Semantic Web

Applications of Machine Learning

Page 15: A review of the state of the art in Machine Learning on the Semantic Web

• Attempts to apply Machine Learning are being made within each of the Semantic Web layers.

• Research activity within each layer can be divided into two parts:

The application of Machine Learning in:• creating the Semantic Web• using the Semantic Web

• Most activity to-date is in creating the Semantic Web

Activity

Page 16: A review of the state of the art in Machine Learning on the Semantic Web

Creating the Semantic Web

• Why can't people do this themselves?– People are frequently unaware of metadata standards– People are (usually) unwilling to spend time creating metadata

• May be no direct benefit (to them)• Boring

– People are incapable of applying metadata consistently• Consistency varies from person to person• Consistency varies in the same person over time

– There's already a huge backlog of unlabelled data on the existing web!– Also, someone else's metadata may not be what you want

• e.g. Site content rating from supplier may be unreliable

Page 17: A review of the state of the art in Machine Learning on the Semantic Web

Automatic Generation of Metadata

• Paper describes examples of ML research that use:

– Inductive Logic Programming (on popular science articles)• F-Score close to human expert. Precision between 0.7 and 1.0

– Hidden Markov Models (on marked-up MUC and MEDLINE texts)• Reported as adequate but not able to scale due to fragmentation of

probability distribution. Portable across domains. SVMs suggested.

– Association Analysis (using Web Directory for labelling examples)• Work in progress but looks for terms in text that indicate directory path

e.g. of a path .../Manufacturing/Materials/Metals/Steel/..

Page 18: A review of the state of the art in Machine Learning on the Semantic Web

Application of ML to Ontologies

• Ontology Vocabulary Layer is currently a popular area of Semantic Web research

• Most ontologies hand-crafted• Creating ontologies is far more complex than RDF

metadata extraction– ILP has been used to revise and maintain, but not create– Association rule learning has been used to partly automate– Regular expression (FSA) rewriting guided by Minimum Description

Length to create Document Type Descriptors (DTD) for XML docs.

• Ontology mapping– Hard problem– Some work using Naive Bayes

Page 19: A review of the state of the art in Machine Learning on the Semantic Web

Using the Semantic Web

• Not much ML research in this area (yet)• Datasets exists

– RSS newsfeeds and Weblogs/Blogs– DAML repository– Dave Beckett's RDF Resource Guide

• Locating suitable data can be a problem

• Semantic Web Mining has been conjectured– combines Semantic Web with Web Mining– Relational Data Mining (RDM) suggested to exploit structure in data

Page 20: A review of the state of the art in Machine Learning on the Semantic Web

Summary

• Semantic Web is rapidly evolving• Key languages:

– RDF– vocabularies built on top of RDF

• Publicly available RDF datasets exist– in applications like RSS– and repositories like DAML

• RDF maps well to Prolog (and SQL)• Machine Learning looks promising for both the

creation and use of the Semantic Web

Page 21: A review of the state of the art in Machine Learning on the Semantic Web

Simon PriceUniversity of Bristolhttp://www.cs.bris.ac.uk/~price


Recommended