+ All Categories
Home > Documents > Life Science 2010

Life Science 2010

Date post: 03-Jan-2016
Category:
Upload: maxine-castillo
View: 32 times
Download: 0 times
Share this document with a friend
Description:
Life Science 2010. Meeting the Challenges with Semantic Technology. By Dr. Sheng-Chuan Wu. Some Looming US Healthcare Crises. Aging of population in developed countries US older population increases 50% in 30 years Life expectancy gets longer (79 – 84 in 2008) - PowerPoint PPT Presentation
Popular Tags:
36
Life Science Life Science 2010 2010 Meeting the Meeting the Challenges with Challenges with Semantic Technology Semantic Technology By Dr. Sheng-Chuan Wu
Transcript
Page 1: Life Science 2010

Life Science Life Science 20102010Meeting the Challenges Meeting the Challenges with Semantic Technologywith Semantic Technology

By

Dr. Sheng-Chuan Wu

Page 2: Life Science 2010

Some Looming US Healthcare Crises

Aging of population in developed countries US older population increases 50% in 30 years Life expectancy gets longer (79 – 84 in 2008)

Cost of healthcare skyrocketing Drug prices increase 50% from 2000 – 2007 (US

CPI is 20% in the same period) Drug development costs mushroom

US$200+ million for a successful drug approval Uninsured reaches epidemic proportion (30%)

Copy Right, Sheng-Chuan Wu July 2009

2

Page 3: Life Science 2010

Despite Enormous Scientific Advance

Sequencing of human genome and greater insight into human genes (e.g., Gene Ontology)

Microarray gene expression chips Better understanding of cancer, viral infection Greater expanded research in pharmacology,

pathology, immunology, physiology, etc. An explosion of information (knowledge) in life

science without comparable benefit

Copy Right, Sheng-Chuan Wu July 2009

3

Page 4: Life Science 2010

Challenges for Life Science – Diversity Very diverse subjects

Copy Right, Sheng-Chuan Wu July 2009

How to relate all the information cohesively?How to relate all the information cohesively?

Page 5: Life Science 2010

Challenges for Life Science – Taxonomy Different disciplines use different taxonomies even

for the same thing

Taxonomic science is intrinsically dynamic

Physiologist

GeneticistPharmacologist

BiochemistVirologist

Copy Right, Sheng-Chuan Wu July 2009

Page 6: Life Science 2010

Field rule code Record ID Version Version status Record status Name for list view Primary level Secondary level Tertiary level Borneensis no. Sufix No. Web releaseattrul id version verstat recstat brief maxcls subcls mincls entryno subno webflg

MA 25 1 1 1 Echinosorex gymnurus Mammals BOR-00000-03065 YesMA 27 1 1 1 Hylomys suillus Mammals BOR-00000-03067 YesMA 29 1 1 1 Suncus murinus Mammals BOR-00000-03069 NoMA 33 1 1 1 Tupaia glis Mammals BOR-00000-03073 No

Registration no. Old Borneensis no Registration date Collection date Collector's name Country State District Village or nearest village Specific localityRegno OldRegno Regdate collectiondate Collector country State District Village locality

MA0000005 9/15/2004 Henry Benard Malaysia Sabah Lahad Datu Tabin Forest Reserve, Lahad DatuMA0000007 9/15/2004 S.Yasuma MalaysiaMA0000009 9/15/2004 S.Yasuma MalaysiaMA0000013 9/15/2004 21/5/1999 Arifin Ag. Ali Malaysia Sabah Tawau Lembangan Maliau Basin

Latitude Longitude Altitude(Sign) Altitude Habitat type Substrate Ecological data Method of capture/collection Specimen preparation Specimen part Sex Total LengthLatitude Longitude Altitude-kbn Altitude Habita Substrate Ecological capture preparation Specimenpart Sex Total Length

Female 625 mm

Male

Tail length Weight Head-body length Hind foot length Forearm length Ear Other measurement Identification date Identifier Identification note Phylum Phylum(ID)meamethod meavalue HB length hindfoot forearm Ear Othemeasure Identdate Identifier Identnote phylum phylum-id

225 mm 400 mm 65 mm 30 mm

CHORDATA207.0 mm 180.0 g 208.0 mm 46.0 mm 16.0 mm CHORDATA

Credited to Universiti Malaysia Sabah

Challenges for Life Science – Information Model

Mostly employing relational model

Page 7: Life Science 2010

Subphylum Subphylum(ID) Superclass Superclass(ID) Class Class(ID) Subclass Subclass(ID) Superorder Superorder(ID) Order Order(ID) Suborder Suborder(ID)subphylum subphylum-id superclass superclass-id Class Class-id subclass subclass-id superorder superorder-id order order-id suborder suborder-id

INSECTIVORAInsectivora

VERTERBRATA MAMMALIA Insectivora VERTERBRATA MAMMALIA Scandentia

Superfamily Superfamily(ID) Family Family(ID) Subfamily Subfamily(ID) Genus Genus(ID) Species Species(ID) Subspecies Author Common name (English)superfamily superfamily-id family family-id subfamily subfamily-id genus genus-id species species-id subspecies author English

ERINACEIDAE Hylominae ID:MA00000385 Echinosorex gymnurusErinaceidae Hylomys suillus Lesser GymnureSoricidae ID:MA00000428 Suncus murinus House ShrewTupaiidae Tupaiinea Tupaia glis ID:MA00000483 Common Treeshrew

Common name (local language) Type status Conservation Status Distribution Preservation method Jar no. Room no. Compactor no. Bay no. Shelves no. Container/Box/Jar no.locallang Typestatus consst distribution Preservation method Jarno roomno compactor bayno shelvesno Containerno

Wet room Wet(Eg-01)Tikus babi Dry room Dry(Hs-01)Cencurut Rumah Wet Room Wet(Sm-01)Tupai Moncong Besar Dry specimen Dry Room Dry(Tg-01)

Loaned ID Loaned to (Name & address) E-mail Telephone Fax Country (Borrower) Date loaned Due date Date returned Remarks Multimedia link Release flag Release level Regn statusloanedID loanedto loanmail phone fax countryloan Loaned duedate Returned remarks medialink opnflg opnlvl matst

Malaysia 10000 20Malaysia 10000 20Malaysia 10000 20Malaysia 10000 20

Horrendous RDB table schema More than 70% of table cells contain null value Need to call in experts to update schema

Credited to Universiti Malaysia Sabah

Challenges for Life Science – Information Model

Page 8: Life Science 2010

Designed for human (90%+), not for computer Copy Right, Sheng-Chuan Wu July 2009

8

Challenges for Life Science – Knowledge Representation

Page 9: Life Science 2010

Many sources (silos) of life science information Our understanding in some areas (e.g., pathways)

is very limited and uncertain We don’t even know what else to come A mammoth A mammoth data integration data integration problem, problem,

let alone integrated understanding & let alone integrated understanding & knowledge discoveryknowledge discovery

Try to design a schema for such data Try to design a schema for such data tables and knowledge warehouses !!tables and knowledge warehouses !!

Copy Right, Sheng-Chuan Wu July 2009

Challenges for Life Science – Integration

Page 10: Life Science 2010

Same Challenges for Every Field in Biology

Many diverse but related subjects Different taxonomies from different disciplines Very complex information model, which must

evolve constantly as we learn more Difficulty in knowledge representation, for

computer not just for human Mammoth information integration problem

Copy Right, Sheng-Chuan Wu July 2009

10

Semantic Technology can help overcome

these challenges

Semantic Technology can help overcome

these challenges

Page 11: Life Science 2010

What is Semantic Technology

A new way of representing (modeling) and accessing information

Make unstructured documents without semantic (on the web or not) computer intelligible

Key enabling standards: URI, RDF, RDFS, OWL and SPARQL

Copy Right, Sheng-Chuan Wu July 2009

11

“The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.[Tim Berners-Lee et al , 2001] ”

“The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.[Tim Berners-Lee et al , 2001] ”

Page 12: Life Science 2010

The Semantic Wave is NOT One Thing … Two Differing Major Waves Within It

The Semantic Web Information sharing on a global scale Intranets vs. Internet

Semantic Technology Standard knowledge representation Enhanced knowledge access and discovery Semantic Interoperability Information syndication and so forth

Copy Right, Sheng-Chuan Wu July 2009

Page 13: Life Science 2010

First, We Need Shared Reference

“Family“

ReferentFormStands for

Relates toactivates

Concept

Computer cannot help without clear data semantics

Computer cannot help without clear data semantics

Copy Right, Sheng-Chuan Wu July 2009

?

Page 14: Life Science 2010

URI (from W3C) Removes Ambiguity

Semantic languages (RDF)To describe mappings, relations (properties) and structures of data

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

<>

<>

<>

CV

name

education

work

private

< >

< >

< >

< >

< >

< CV >

<name >

<educ >

<>

< >

CV

name

education

work

private

< >

< >

< >

< >

< >

< CV >

<name >

<educ >

<>

< >

CV

name

education

work

private

< >

< >

< >

< >

< >

< CV >

<name >

<educ >

<>

< >

CV

name

education

work

private

< >

< >

< >

< >

< >

< >

< >

< >

< ‹›„⁄ >

CV

name

education

work

private

< >

< >

< >

< >

< >

< CV >

<name >

<educ >

<>

< >

W3C Ontology (RDF/RDFS/OWL)Provide external common references, structure and property for individual terms

Copy Right, Sheng-Chuan Wu July 2009

Universal Resource Identifier (URI) as common reference

<http://http://xmlns.com/foaf/0.1/Thing.owl #Family>

<http://www.usda.gov/classification/plants/taxaonomy.owl#Family>

Page 15: Life Science 2010

15 IDD Workshop 2008 | Semantic Web in Life Sciencesl | Martin Romacker, Therese Vachon| 30. Sep .2008

RDF (Resource Description Framework) from W3C is a triplet consisting of a uniform structure:

[subject] [predicate] [object]Virus_infection caused_by virus

This structure is close to simple phrases in natural language. With URI, generates a graph (easy to visualize). All are represented as unique resources (URI).

Virus_infection VirusCaused_by

AIDS

rdf:type

HIV_Virus

rdf:type

RDF – Distributed Data (Knowledge)

Page 16: Life Science 2010

Adding a new relation requires a change of the DB schema

DOM1913

Table friendsID1 ID22 35

One-to-Many Relational Model – Many Table Joints

Copy Right, Sheng-Chuan Wu July 2009

Page 17: Life Science 2010

17

Equivalent Semantic Model – Easy with URI & RDF<triple 32: "person2" "type" "person"><triple 33: "person2" "first-name" "Rose"><triple 34: "person2" "middle-initial" "Elizabeth"><triple 35: "person2" "last-name" "Fitzgerald"><triple 36: "person2" "suffix" "none"><triple 37: "person2" "alma-mater" "Sacred-Heart-Convent"><triple 38: "person2" "birth-year" "1890"><triple 39: "person2" "death-year" "1995"><triple 40: "person2" "sex" "female"><triple 41: "person2" "spouse" "person1"><triple 58: "person2" "has-child" "person17"><triple 56: "person2" "has-child" "person15"><triple 54: "person2" "has-child" "person13"><triple 52: "person2" "has-child" "person11"><triple 50: "person2" "has-child" "person9"><triple 48: "person2" "has-child" "person7"><triple 46: "person2" "has-child" "person6"><triple 44: "person2" "has-child" "person4"><triple 42: "person2" "has-child" "person3"><triple 60: "person2" "profession" "home-maker">

Copy Right, Sheng-Chuan Wu July 2009

<triple 66: "person2" "has-friend" "person35"><triple 67: "person2" “year-of-marriage" “1913">

Page 18: Life Science 2010

A Little Ontology Goes A Long WayTaxonomic Structure (ontology)

Copy Right, Sheng-Chuan Wu November 2008

Page 19: Life Science 2010

Copy Right, Sheng-Chuan Wu July 2009

19

A Much Better Model with Semantic

Page 20: Life Science 2010

Synergies in Knowledge RepresentationBiomedical Taxonomy/ Ontologies

Page 21: Life Science 2010

RDF Class Hierarchy Maps Taxonomy

Copy Right, Sheng-Chuan Wu July 2009

21

Page 22: Life Science 2010

WhiteLadySlipper type OrchidFamilyOrchidFamily subClassOf Liliidae

rdf:type and rdfs:subClassOf are from W3C standard, and the relationship is transitive

Relationship Model

WhiteLadySlipper type Liliidae

Information Inferred

Question

In which subclass does WhiteLadySlipper belong?

Answer

Liliidae

Information Given

Relationships are explicit in the model and directly available to applications!

Where are the relationships?

Semantic Model – Explicit Relationship

July 2009

Page 23: Life Science 2010

ID Species NameIDC WhiteLadySlipper

Family Subclass OrchidFamily Liliidae

FM_ID SP_ID OrchidFamily IDC

Species Table

SubclassTable

Species_FamilyTable

Question

In which Subclass is WhiteLadySlipper located?

Answer

In Liliidae

Develop a Query

Select Subclass

From Species_Table, SubclassTable, Species_Family Table

Where Sepices_Name = “WhiteLadySlipper” and ID = SP_ID and Family = FM_ID

Relationships are in documents, SQL code and collective memories - not available to applications!

Where are the relationships?

Data Definition Statements? Applications do not use them, they are not descriptive and their scope is a single database

Data Dictionary? Data Registry? They are for human, not computer use

Relational Model – Implicit Relationship

July 2009

Page 24: Life Science 2010

WhiteLadySlipper type OrchidFamilyOrchidFamily subClassOf OrchidalesOrchidales subClassOf Liliidae

type and subClassOf are from W3C RDFS standard, and the relationships are trasitive

Relationship Model

WhiteLadySlipper type OrchidalesWhiteLadySlipper type Liliidae

Information automatically Inferred

QuestionIn which subclass does WhiteLadySlipper belong?

AnswerLiliidae

Information Given

Semantic Model – Explicit Relationship

July 2009

OrchidFamily subClassOf Liliidae

new data

Changes are Easy to Make

Changes are Easy to Make

Page 25: Life Science 2010

ID Species NameIDC WhiteLadySlipper

Order Name ID Subclass Orchidales ORD Liliidae

FM_ID SP_ID OrchidFamily IDC

Family_ID Order_ID OrchidFamily ORD

Species Table

Order Table

Species_Family Table

Family_Order Table

Doesn’t workany more!

Relational Model Changes at Great Peril

July 2009

ID Species NameIDC WhiteLadySlipper

Family Subclass OrchidFamily Liliidae

FM_ID SP_ID OrchidFamily IDC

Species Table

SubclassTable

Species_FamilyTable

Develop a Query

Select Subclass

From Species_Table, SubclassTable, Species_Family Table

Where Sepices_Name = “WhiteLadySlipper” and ID = SP_ID and Family = FM_ID

Question

In which Subclass is WhiteLadySlipper located?

??

Get No Answer!

Changes should be avoided at ALL costs

Changes should be avoided at ALL costs

Page 26: Life Science 2010

26

Changing Taxonomy Affects Only the ViewView on the Species

rdf:type

rdfs:subClassOf

Copy Right, Sheng-Chuan Wu November 2008

Page 27: Life Science 2010

Designed for human (90%+), not for computer Copy Right, Sheng-Chuan Wu July 2009

27

How About Unstructured Documents with No Semantic?

Page 28: Life Science 2010

Find me a Thai restaurant that is Halal, not too expensive, no alcohol served, near Jorong Point Shopping Centre in Singapore.

Find me a Thai restaurant that is Halal, not too expensive, no alcohol served, near Jorong Point Shopping Centre in Singapore.

Turn Document into Semantic Model

Copy Right, Sheng-Chuan Wu July 2009

28

http://www.zabih

ah.com/ds.php?id=1716

Restaurant

true

halalAuth

Thai

cuisine

Indonesi

an

cuisine

Med

ian

price

“1 ju

rong

w

est

cent

ral 2

addr

ess

Sin

gapo

r

e

city

Sin

gapo

r

e

coun

try

Fals

e

alco

holS

erve

d “65- 6792- 6593”

phone

true

halal

type

Jurong Point

address

Best keyword search engine gives very unsatisfactory results

Intelligent, complex, ad hoc query, beyond keyword search, now possible

Page 29: Life Science 2010

Ad Hoc Query with SPARQLSPARQL – Biological Processes in Dendrites

Copy Right, Sheng-Chuan Wu July 2009

29

Alzheimer’s disease is characterized by neural degeneration.

Among other things, there is damage to dendrites and axons, parts of nerve cells.

What resources do we have available to learn more about biological processes in dendrites?

Page 30: Life Science 2010

Query Gene Ontology (GO) for Clues

Copy Right, Sheng-Chuan Wu July 2009

30

Inference at work

Page 31: Life Science 2010

Looking for Alzheimer Disease Targets

Signal transduction pathways are considered to be rich in “druggable” targets - proteins that might respond to chemical therapy

CA1 Pyramidal Neurons are known to be particularly damaged in Alzheimer’s disease.

Can we find candidate genes known to be involved in signal transduction and active in Pyramidal Neurons?

Copy Right, Sheng-Chuan Wu July 2009

31

Page 32: Life Science 2010

Scientific Inquiry over Many Sources

Copy Right, Sheng-Chuan Wu July 2009

32

Page 33: Life Science 2010

A SPARQL Query Spanning 4 Sources

Copy Right, Sheng-Chuan Wu July 2009

33

Ad hoc queries over multi data sources (in RDF) easy

Ad hoc queries over multi data sources (in RDF) easy

Page 34: Life Science 2010

Finally, Semantic TechnologyA Different but Better Mouse Trap Database w/o schema, nor table; change easily Distribution and integration of data easy and

seamless with URI and RDF Separation of taxonomic information and

individual data Ontology to bridge different taxonomies A query language (SPARQL) for ad hoc pattern

matching Ideal for modeling & accessing life science dataIdeal for modeling & accessing life science data

Copy Right, Sheng-Chuan Wu July 2009

34

Semantic Technology gives us

an integrated view of available knowledge

Semantic Technology gives us

an integrated view of available knowledge

Page 35: Life Science 2010

Potential Applications

Bridging natural herbal medicine and western medical knowledge

Fresh water fishery (environmental) management

Plant biotechnology, Precision Agriculture Biodiversity repository All based on a single framework to model, All based on a single framework to model,

integrate and access different life science integrate and access different life science “knowledge” sources“knowledge” sources

Copy Right, Sheng-Chuan Wu July 2009

35

Page 36: Life Science 2010

Semantic TechnologySemantic Technologyforfor

Life ScienceLife Science

Technology for a healthier life


Recommended