+ All Categories
Home > Presentations & Public Speaking > ALUMNI 2015_The Sematic Web, A Vision come True or Giving Up the Great Plan, Martin Hepp

ALUMNI 2015_The Sematic Web, A Vision come True or Giving Up the Great Plan, Martin Hepp

Date post: 18-Aug-2015
Category:
Upload: sti-innsbruck
View: 25 times
Download: 1 times
Share this document with a friend
Popular Tags:
19
The Semantic Web – A Vision Come True, or Giving Up the Great Plan? Martin Hepp, @mfhepp [email protected]
Transcript

The Semantic Web – A Vision Come True, or Giving Up the Great Plan? Martin Hepp, @mfhepp

[email protected]

Semantic Web: A Decade of Achievement?

•  Linked Open Data Cloud •  Schema.org •  Google Knowledge Graph •  Bing Sartori •  Linked Data in Libraries •  Linked Data in Public Data Initiatives •  Etc.

Semantic Web and Linked Data Success Stories

http://www.heppresearch.com 2

The LOD Cloud

A hard-wired, small-scale data integration project with no quality of service guarantees.

http://www.heppresearch.com 3

Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/

Web Data Commons

A pretty outdated RDF representation of information extracted from a biased sample of popular Web pages, missing a lot of data in deep detail pages.

http://www.heppresearch.com 4

2015-04-02: RDFa, Microdata, and Microformat data sets extracted from the December 2014 Common Crawl corpus available for download.

The Old Testament of the Semantic Web

http://www.heppresearch.com 5

Mostly WHAT a better Web should allow §  Computers should be able to help us

process information from the Web

The New Testament of the Semantic Web

http://www.heppresearch.com 6

Detailed technical assumptions about the HOW §  Widely driven by applying principles

from small-scale, controlled settings to the Web.

§  Need for extensions of old paradigms acknowledged.

§  But fundamental question of match between paradigms and ecosystem largely unchallenged.

The Modern Sects and their Cults

http://www.heppresearch.com 7

Turned assumptions and drafts into laws §  Linked Data Principles

–  URIs over strings

§  Entity identifiers

§  Qualitative values (enumerations)

–  Page vs. Entity / Conneg / Redirects

–  Open Licenses

§  SPARQL endpoints

§  Reuse visible content in RDFa and Microdata Berner-Lee, Tim: Linked Data,

http://www.w3.org/DesignIssues/LinkedData.html

An now they fight a useless war over the details of their interpretation…

http://www.heppresearch.com 8

3rd Commandment: Thou shalt not make unto thee any graven image §  Exodus 20:4-6 §  Minimal ontological commitment, folks! §  Occam's razor

§  Ludwig Wittgenstein: Tractatus Logico-Philosophicus: –  “Occam's Razor is, of course, not an arbitrary rule nor one justified by its practical success. It

simply says that unnecessary elements in a symbolism mean nothing. Signs which serve one purpose are logically equivalent; signs which serve no purpose are logically meaningless.” (*)

Image Credit: PD, https://en.wikipedia.org/?title=Crusades#/media/File:Albigensian_Crusade_01.jpg

(*) Taken from https://en.wikipedia.org/wiki/Occam's_razor#Ludwig_Wittgenstein

What is schema.org? What is GoodRelations?

1. Official Characterization 2. Purpose: §  Focus on information extraction on the Web §  Other uses as a by-product

3. Knowledge Representation Perspective §  Entity Types §  Relationship Types §  Weak Domain / Range Semantics §  Syntax-independent Meta-Model

And how are they related?

Questions? Suggestions? Contact me at @mfhepp! 9

Official Characterization from http://schema.org

Questions? Suggestions? Contact me at @mfhepp! 10

This site provides a collection of schemas that webmasters can use to markup HTML pages in ways recognized by major search providers, and that can also be used for structured data interoperability (e.g. in JSON). Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right Web pages.

Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find

relevant information on the web. Markup can also enable new tools and applications that make use of the structure.

Overview and Motivation: There is REAL Momentum

Questions? Suggestions? Contact me at @mfhepp! 11

A lot of data

§ Since 2011, schema.org has been added to >25% of top-ranked e-commerce sites product detail pages.

§ RDF-based representations are specified.

Table: Random sample of n=73 product detail pages from high-ranking Google results.

Note that these numbers have a strong bias towards popular, professionally operated sites.

Schema.org: A Data Publication Ontology

Questions? Suggestions? Contact me at @mfhepp! 12

Not designed for raw data consumption (only as a by-product) §  Historically, ontologies in computer

science aimed at harmonizing the conceptualization and representation of data for publishers and consumers of the data.

§  Implicit goal of the traditional Semantic Web stack: More or less, consumption of raw data.

§  This requires detailed consensus on the level of data granularity and data semantics at scale, and high data quality.

§  Schema.org does not make this assumption, since its sponsors have the power to work on semi-structured data at Web scale.

Ontology schema.org

Schema.org: The Semantic Web Vision Come True?

1. No OWL. Not even an ontology in the narrow sense. 2. Direct consumption difficult §  Crawling §  Cleansing §  Lifting

3. No broad use of Linked Data principles §  Mostly no global entity identifiers §  Page = Entity (vs. httpRange-14) §  No vocabulary reuse (*)

Likely not what the Semantic Web community had hoped for.

Questions? Suggestions? Contact me at @mfhepp! 13

Web Ontology Engineering Patterns

1. Dynamic Degree of Disambiguation 2. Dynamic Data Granularity 3. Sweet Spots Rule §  Distinctions that can be populated reliably and with little

effort §  Distinctions that are hard to reconstruct by the recipient

Hepp (2015, forthcoming)

http://www.heppresearch.com 14

The Fallacy of Raw Consumption of Web Data

http://www.heppresearch.com 15

Naïve Type Membership Interpretation: SPARQL

# Find former STI members who are professors

PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>

SELECT * {?s a dbpedia-owl:Professor} LIMIT 100

Naïve Type Membership Interpretation: SPARQL

http://www.heppresearch.com 17

Find all professors from Web markup <html prefix="schema: http://schema.org/ ! dbpedia: http://dbpedia.org/ontology/"> !<!-- .. -->!<div typeOf="schema:Person dbpedia:Professor" about="#person"> ! <span property="schema:honorificPrefix">Prof. Dr.</span>&nbsp; ! <span property="schema:givenName">Zaphod</span> ! <span property="schema:familyname">Beeblebrox</span> !</div> !</html>

Type Membership as a Machine Learning Problem

http://www.heppresearch.com 18

Supervised Learning: Logistic Regression §  Input:

–  Entity e –  Type t –  Origin (Graph / Domain / URI) o –  Optional: Properties and property values [(p1,v1), (p2,v2),…]

§  Output –  t’(e) = f(e, t, o) –  p(t(e) == True)

Example data: (http://www.acme.org/, …#person, http://schema.org/EducationEvent)

(http://munich.eventful.com/, …#event1, http://schema.org/MusicEvent)

Hepp (2015b, forthcoming)

Let’s Do Science, not Cult!

http://www.heppresearch.com 19

§  Challenge paradigms and approaches

§  Use hard data, not beliefs and assumptions (neither your own ones nor the ones inherited from the old folks)

CC BY-SA 3.0 / Nicor / https://en.wikipedia.org/wiki/North_Korea's_cult_of_personality#/media/File:Mansudae_Grand_Monument_08.JPG

Thank you.

http://www.heppresearch.com 20

HEPP RESEARCH GmbH Prof. Dr. Martin Hepp, CEO

Contact us!

Kuppelnaustrasse 5 88212 Ravensburg, Germany Phone +49 751 2708 5256-0 Fax +49 751 2708 5256-9

www.heppresearch.com [email protected]


Recommended