Optique presentation

Post on 18-Dec-2014

65 views 0 download

description

Optique - to provide semantic end-to-end connection between users and data sources; enable users to rapidly formulate intuitive queries using familiar vocabularies and conceptualisations and return timely answers from large scale and heterogeneous data sources.

transcript

Ian HorrocksInformation Systems GroupDepartment of Computer ScienceUniversity of Oxford

What is Big Data?

What is Big Data?

“a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications” (wikipedia)

What is Big Data?

“a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications” (wikipedia)

Case Study: Energy Services

Service centres responsible for remote monitoringand diagnostics of 1,000s of gas/steam turbines

Engineers use a variety of data for visualization, diagnostics and trend detection:

several TB of time-stamped sensor data several GB of event data data grows at 30GB per day

Case Study: Energy Services

Service centres responsible for remote monitoringand diagnostics of 1,000s of gas/steam turbines

Engineers use a variety of data for visualization, diagnostics and trend detection:

several TB of time-stamped sensor data several GB of event data data grows at 30GB per day

Service Requests1,000 requests per center per year80% of time used on data gatheringPotential saving: €50,000,000/year

Case Study: Energy Services

Service centres responsible for remote monitoringand diagnostics of 1,000s of gas/steam turbines

Engineers use a variety of data for visualization, diagnostics and trend detection:

several TB of time-stamped sensor data several GB of event data data grows at 30GB per day

Service Requests1,000 requests per center per year80% of time used on data gatheringPotential saving: €50,000,000/year

Diagnostic Functionality2–6 p/m to add new functionNew diagnostics → better

exploitation of dataPotential saving: incalculable

Case Study: Exploration

Develop stratigraphic models of unexplored areas Geologists & geophysicists use data from

previous operations in nearby locations 1,000 TB of relational data using diverse schemata spread over 1,000s of tables and multiple data bases

Case Study: Exploration

Develop stratigraphic models of unexplored areas Geologists & geophysicists use data from

previous operations in nearby locations 1,000 TB of relational data using diverse schemata spread over 1,000s of tables and multiple data bases

Data Access900 geologists & geophysicists30-70% of time on data gathering4 day turnaround for new queriesPotential saving: €70,000,000/year

Case Study: Exploration

Develop stratigraphic models of unexplored areas Geologists & geophysicists use data from

previous operations in nearby locations 1,000 TB of relational data using diverse schemata spread over 1,000s of tables and multiple data bases

Data Access900 geologists & geophysicists30-70% of time on data gathering4 day turnaround for new queriesPotential saving: €70,000,000/year

Data ExploitationBetter use of experts timeData analysis “most important

factor” for drilling success

Potential value: > €10bn/project

Data Access Problem

Data Access Problem

Solution: OBDA

Provide semantic end-to-end connectionbetween users and data sources

Objectives

Provide semantic end-to-end connectionbetween users and data sources

Enable users to rapidly formulate intuitive queries using familiar vocabularies and conceptualisations

Objectives

Provide semantic end-to-end connectionbetween users and data sources

Enable users to rapidly formulate intuitive queries using familiar vocabularies and conceptualisations

Return timely answers from large scaleand heterogeneous data sources

Objectives

Solution

Query rewriting:

• uses ontology & mappings

• computationally hard

• ontology & mappings small

Solution

Query rewriting:

• uses ontology & mappings

• computationally hard

• ontology & mappings small

Query evaluation:

• ind. of ontology & mappings

• computationally tractable

• data sets very large

Solution

Query rewriting:

• uses ontology & mappings

• computationally hard

• ontology & mappings small

Query evaluation:

• ind. of ontology & mappings

• computationally tractable

• data sets very large

Other features:

support for query

formulation

Solution

Query Formulation

Query Formulation

Query Formulation

Query Formulation

Query Formulation

Query Formulation

Query Formulation

Query rewriting:

• uses ontology & mappings

• computationally hard

• ontology & mappings small

Query evaluation:

• ind. of ontology & mappings

• computationally tractable

• data sets very large

Other features:

“Bootstrapping”

Ontology & mappings

Solution

Solution

Direct MappingsDirect

Mapping

Extractor

OWL Vocabulary

Metadata

propagator

SOTA

Ontology

Ontology

Alignment

OWL OntologyExtended

OWL

Ontology

Bootstrapping:

Query rewriting:

• uses ontology & mappings

• computationally hard

• ontology & mappings small

Query evaluation:

• ind. of ontology & mappings

• computationally tractable

• data sets very large

Other features:

IT-expert oversees

O&M management

Solution

Query rewriting:

• uses ontology & mappings

• computationally hard

• ontology & mappings small

Query evaluation:

• ind. of ontology & mappings

• computationally tractable

• data sets very large

Other features:

Adapter to support

streaming data

Solution

Stream Adapter

Goal: Support for data

generated by sensors historical data

Stream Adapter

Goal: Support for data

generated by sensors historical data

Challenges: Time aware OBDA

Queries Ontologies Mappings Data

Stream Adapter

Goal: Support for data

generated by sensors historical data

Challenges: Time aware OBDA

Queries Ontologies Mappings Data

STARQL query language Temporalised SPARQL

Query rewriting:

• uses ontology & mappings

• computationally hard

• ontology & mappings small

Query evaluation:

• ind. of ontology & mappings

• computationally tractable

• data sets very large

Other features:

Distributed query

execution

Solution

Thank you for listening

Any questions?