+ All Categories
Home > Documents > Sem a Tic Microsoft

Sem a Tic Microsoft

Date post: 10-Apr-2018
Category:
Upload: abdul-khalique
View: 219 times
Download: 0 times
Share this document with a friend
31
Semantic Application for Digital Repositories Fabrizio Gagliardi EMEA & LAT AM Director Technical Computing MSR External Research Microsoft Corporation
Transcript

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 1/31

Semantic Application for 

Digital Repositories

Fabrizio GagliardiEMEA & LATAM Director

Technical ComputingMSR External ResearchMicrosoft Corporation

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 2/31

• Advancement of Science

• Global Collaboration• Technology Excellence

• Interoperability

Microsoft Research’s Commitment to Science

Putting computing into science…Applying Microsoft products and research technologies to advancethe scientific research and engineering innovation process

Putting science into computing…Ensuring that research community requirements are factored into

future versions of Microsoft software

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 3/31

• Semantic relationships between different data

• Semantic descriptions of services

• Annotations

• Provenance• Repositories

• Ontologies

myGrid

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 4/31

Research Output Repository Platform

Goals• A platform for building services and tools for research

output repositories• Papers, Videos, Presentations, Lectures,

References, Data, Code, etc.• Relationships between stored entities

• Enable a tools and services ecosystem for “research

output” repositories on MS technologies

Execution• Utilizing OAI-ORE, SWORD, and other

community protocols• In development, deployment within MSR in early Q4• Beta release to the community in late Q4• Built on SQL Server 2008 + Entity Framework

• Using WPF and Silverlight for UI

Researchoutput

repository

platform

UIs

DesktopTools

SyndicationInterop

Search

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 5/31

Goals

• Create a platform for building“research output” repositories

• Engage with the digital library and

scholarly communications

community

• Become the “research output”

repository for MSR (RMCr project)

 – Papers, Videos, Presentations, Lectures,

References, Data, Code, etc.

• Support an ecosystem of services and

tools

• Available to the community for free(we are still considering the open

source route)

• Build an easy-to-install collection of 

basic services and tools

Non-goals

• A generic platform for assetmanagement

• Support the lifecycle of publications

• Compete with existing repository

solutions

Research Output Repository Platform

Services/tools

Microsoft.Famulus.Framework

Microsoft.Famulus.Core

(Based on the Entity Framework Model + extensions)

SQL Server 2008, MS data storage technologies, Entity

Framework runtime

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 6/31

• A Semantic Computing platform

• A hybrid between a relational database and a triple store

Research Output Repository Platform

Triple stores

-Evolution friendly

-Poor performance

-No need to model everything in advance-Semantic interpretation at the application level

Relational schema

-Evolution not so easy

-Great opportunities for optimization

-Model everything in advance

Research Output Repository Platform-Maintain a balance

-Try to model the frequently used entities in our app domain

-Try to capture the frequently used relationships

-Allow for extensibility (Relationships, Attributes)

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 7/31

An intuitive programming experience

Person tony = new Person();

Publication pub1 = new Publication();

pub1.Title = "Title1";

Publication pub2 = new Publication();

pub2.Title = "Title2";

pub1.Cites.Add(pub2);

pub1.Authors.Add(tony);

Tag tag = new Tag();

tag.Name = "keyword";

pub1.Tags.Add(tag);

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 8/31

Research Output Repository Platform

PowerPoint

presentation

Lecture on

2/19/2008

authored by

tony

presented by

organized by

Elizabeth, Sebastien,

Matthew, Norman,

Brian, Sarah, George, Roy

PDF file

is representation of  contains

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 9/31

Researchers manage their personal research entities(data, citations, documents, workflows, etc.)

Entities + Relationships can be synchedto cloud storage so that they are:

- Always Available

- Sharable

- Mixable

- Harvestable

An Ecosystem of Research Repositories

Support of harvesting & federationto/from Institutional Repositories

- arXiv.org

- DSpace- ePrints

- Fedora

- etc.

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 10/31

• Limit Tech Preview release due June 2008

• Public Beta targeted for Aug/Sept 2008

For more details – Contact:

• Alex Wade (Program Manager) / [email protected]

 – Community Forum:

• http://community.research.microsoft.com/forums/90.aspx

Current Project Status

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 11/31

eScience and Semantic Computingmeet the Cloud

The cyberinfrastructure for the next

generation of researchers

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 12/31

• Expect scientific research environments will follow

similar trends to the commercial sector – Leverage computing and data storage in the cloud

 – Scientists already experimenting with Amazon S3 and EC2services, with mixed results;

• For many of the same reasons – Siloed research teams, no resource sharing across labs

 – High storage costs

 – Low resource utilization

 – Excess capacity – High costs of reliably keeping machines up-to-date

 – Little support for developers, system operators

12

The Future: Software plus Services for Science?

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 13/31

• Collective intelligence

 – If last.fm can recommend what song to broadcast to mebased on what my friends are listening to, why cannot thecyberinfrastructure of the future recommend articles of potential interest based on what the experts in the field

that I respect are reading? – Already examples emerging but the process is manual

(Connotea, BioMedCentral Faculty of 1000 ...)

• Automatic correlation of scientific data

• Smart composition of services and functionality

• Cloud computing to aggregate, process, analyze andvisualize data

A smart cyberinfrastructure

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 14/31

• Important/key considerations

 – Formats or “well-known” representations

of data/information

 – Pervasive access protocols are key (e.g.

HTTP)

 – Data/information is uniquely identified

(e.g. URIs)

 – Links/associations between

data/information

• Data/information is inter-

connected through machine-interpretable information (e.g.

paper X is about star Y)

• Social networks are a special case

of ‘data networks’

A world where all data is linked…

Attribution: Richard Cyganiak

d d/ d/ l d h

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 15/31

…and stored/processed/analyzed in the

cloudscholarly

communications

domain-specific services

instant

messaging

identity

document store

blogs &

social networking

mail

notification

search

books

citations

visualization and

analysis services

storage/data

services

compute

services

virtualization

Projectmanagement

Reference

management

knowledge

management

knowledge

discovery

Vision of Future Research

Environment with bothSoftware + Services

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 16/31

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 17/31

Added slides

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 18/31

eScience

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 19/31

• Thousand years ago – Experimental Science

 – Description of natural phenomena• Last few hundred years – Theoretical Science

 – Newton’s Laws, Maxwell’s Equations…

• Last few decades – Computational Science – Simulation of complex phenomena

• Today – eScience or Data-centric Science – Unify theory, experiment, and simulation

 – Using data exploration and data mining• Data captured by instruments

• Data generated by simulations

• Data generated by sensor networks

 – Scientists overwhelmed with data – Computer Science and IT companies

have technologies that will help

(With thanks to Jim Gray)

Emergence of a New Research Paradigm?

2

2

2.

3

4

a

cG

a

 

 

 

    

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 20/31

Web users...

• Generate content on the Web – Blogs, wikis, podcasts, videocasts,

etc.

• Form communities

 – Social networks, virtual worlds

• Interact, collaborate, share

 – Instant messaging, web forums,

content sites

• Consume information and

services

 – Search, annotate, syndicate

Scientists...

• Annotate, share, discover data – Custom, standalone tools

• Conferences, Journals

 – Publication process is long,

subscriptions, discoverability issues• Collaborate on projects, exchange

ideas

 – Email, F2F meetings, video-

conferences

• Use workflow tools to compose

services

 – Domain-specific services/tools

Today

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 21/31

Data can be easily produced

http://ecrystals.chem.soton.ac.uk

Thanks to Jeremy Frey

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 22/31

Data and services can be easily composed

SensorMap

Functionality: Map navigation

Data: sensor-generated temperature, video camera feed,

traffic feeds, etc.

Taverna Workflow

Compose services from the Web

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 23/31

Data is easily accessible

With thanks to

Catharine van Ingen

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 24/31

Data is easily shareable

Sloan Digital Sky Server/SkyServer

http://cas.sdss.org/dr5/en/ 

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 25/31

Today…

storing computing

managing indexing

huge amountsof data

For example, Google and Microsoft both have copies of the Web

for indexing purposes

Computers aregreat tools for

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 26/31

Tomorrow…

acquisition discovery

aggregation organization

correlation analysis

interpretation inference

We would likecomputers to also

help with theautomatic

of the world’s

information

storing computing

managing indexing

huge amountsof data

Computers will stillbe great tools for

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 27/31

Semantic Computing

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 28/31

• Set of concepts and technologies

 – Data modeling

 – Relationships

 – Ontologies

 – Machine learning (entity extraction) – Inference, reasoning

 – Data, information, knowledge…

What is Semantic Computing?

Data Information Knowledge Intelligence Wisdom

Current technologies

Possibilities for innovation

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 29/31

• Term used to refer to the concept of “meaning”

• The linguistics, AI, Natural Language Processing,

etc. communities have been working on

“meaning” and ”knowledge” related technologies

for decades

• Pragmatic approach to Semantic Computing

 – Emergence of a new breed of technologies to capture

meaning (RDF, OWL, etc.)

 – Combine with the pervasiveness of the Web

community technologies such as folksonomies …

Semantics

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 30/31

• The term is used to describe a set

of technologies used to representdata, concepts, and their

relationships

 – Become a buzzword like Web 2.0

• Prefer to use the term “SemanticComputing” which is about

modeling data in ways that can

be automatically processed by

computers

A word about the “Semantic Web”

8/8/2019 Sem a Tic Microsoft

http://slidepdf.com/reader/full/sem-a-tic-microsoft 31/31

• Some efforts are driven by the traditional

“knowledge engineering” community – Engaged in building well-controlled ontologies

 – Important for domain-specific vocabularies with dataformats and relationships specific to a community

 – Model does not easily scale to the Internet

• Some efforts are driven by the Web 2.0 community

 – Focus on the pervasiveness of Web protocols/standards

 – Emphasis on microformats (small, flexible, embeddablestructures)

 – Exploit evolving and ever-expanding vocabularies such asfolksonomies and tag clouds

Semantic Computing


Recommended