Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | maxine-castillo |
View: | 32 times |
Download: | 0 times |
Life Science Life Science 20102010Meeting the Challenges Meeting the Challenges with Semantic Technologywith Semantic Technology
By
Dr. Sheng-Chuan Wu
Some Looming US Healthcare Crises
Aging of population in developed countries US older population increases 50% in 30 years Life expectancy gets longer (79 – 84 in 2008)
Cost of healthcare skyrocketing Drug prices increase 50% from 2000 – 2007 (US
CPI is 20% in the same period) Drug development costs mushroom
US$200+ million for a successful drug approval Uninsured reaches epidemic proportion (30%)
Copy Right, Sheng-Chuan Wu July 2009
2
Despite Enormous Scientific Advance
Sequencing of human genome and greater insight into human genes (e.g., Gene Ontology)
Microarray gene expression chips Better understanding of cancer, viral infection Greater expanded research in pharmacology,
pathology, immunology, physiology, etc. An explosion of information (knowledge) in life
science without comparable benefit
Copy Right, Sheng-Chuan Wu July 2009
3
Challenges for Life Science – Diversity Very diverse subjects
Copy Right, Sheng-Chuan Wu July 2009
How to relate all the information cohesively?How to relate all the information cohesively?
Challenges for Life Science – Taxonomy Different disciplines use different taxonomies even
for the same thing
Taxonomic science is intrinsically dynamic
Physiologist
GeneticistPharmacologist
BiochemistVirologist
Copy Right, Sheng-Chuan Wu July 2009
Field rule code Record ID Version Version status Record status Name for list view Primary level Secondary level Tertiary level Borneensis no. Sufix No. Web releaseattrul id version verstat recstat brief maxcls subcls mincls entryno subno webflg
MA 25 1 1 1 Echinosorex gymnurus Mammals BOR-00000-03065 YesMA 27 1 1 1 Hylomys suillus Mammals BOR-00000-03067 YesMA 29 1 1 1 Suncus murinus Mammals BOR-00000-03069 NoMA 33 1 1 1 Tupaia glis Mammals BOR-00000-03073 No
Registration no. Old Borneensis no Registration date Collection date Collector's name Country State District Village or nearest village Specific localityRegno OldRegno Regdate collectiondate Collector country State District Village locality
MA0000005 9/15/2004 Henry Benard Malaysia Sabah Lahad Datu Tabin Forest Reserve, Lahad DatuMA0000007 9/15/2004 S.Yasuma MalaysiaMA0000009 9/15/2004 S.Yasuma MalaysiaMA0000013 9/15/2004 21/5/1999 Arifin Ag. Ali Malaysia Sabah Tawau Lembangan Maliau Basin
Latitude Longitude Altitude(Sign) Altitude Habitat type Substrate Ecological data Method of capture/collection Specimen preparation Specimen part Sex Total LengthLatitude Longitude Altitude-kbn Altitude Habita Substrate Ecological capture preparation Specimenpart Sex Total Length
Female 625 mm
Male
Tail length Weight Head-body length Hind foot length Forearm length Ear Other measurement Identification date Identifier Identification note Phylum Phylum(ID)meamethod meavalue HB length hindfoot forearm Ear Othemeasure Identdate Identifier Identnote phylum phylum-id
225 mm 400 mm 65 mm 30 mm
CHORDATA207.0 mm 180.0 g 208.0 mm 46.0 mm 16.0 mm CHORDATA
Credited to Universiti Malaysia Sabah
Challenges for Life Science – Information Model
Mostly employing relational model
Subphylum Subphylum(ID) Superclass Superclass(ID) Class Class(ID) Subclass Subclass(ID) Superorder Superorder(ID) Order Order(ID) Suborder Suborder(ID)subphylum subphylum-id superclass superclass-id Class Class-id subclass subclass-id superorder superorder-id order order-id suborder suborder-id
INSECTIVORAInsectivora
VERTERBRATA MAMMALIA Insectivora VERTERBRATA MAMMALIA Scandentia
Superfamily Superfamily(ID) Family Family(ID) Subfamily Subfamily(ID) Genus Genus(ID) Species Species(ID) Subspecies Author Common name (English)superfamily superfamily-id family family-id subfamily subfamily-id genus genus-id species species-id subspecies author English
ERINACEIDAE Hylominae ID:MA00000385 Echinosorex gymnurusErinaceidae Hylomys suillus Lesser GymnureSoricidae ID:MA00000428 Suncus murinus House ShrewTupaiidae Tupaiinea Tupaia glis ID:MA00000483 Common Treeshrew
Common name (local language) Type status Conservation Status Distribution Preservation method Jar no. Room no. Compactor no. Bay no. Shelves no. Container/Box/Jar no.locallang Typestatus consst distribution Preservation method Jarno roomno compactor bayno shelvesno Containerno
Wet room Wet(Eg-01)Tikus babi Dry room Dry(Hs-01)Cencurut Rumah Wet Room Wet(Sm-01)Tupai Moncong Besar Dry specimen Dry Room Dry(Tg-01)
Loaned ID Loaned to (Name & address) E-mail Telephone Fax Country (Borrower) Date loaned Due date Date returned Remarks Multimedia link Release flag Release level Regn statusloanedID loanedto loanmail phone fax countryloan Loaned duedate Returned remarks medialink opnflg opnlvl matst
Malaysia 10000 20Malaysia 10000 20Malaysia 10000 20Malaysia 10000 20
Horrendous RDB table schema More than 70% of table cells contain null value Need to call in experts to update schema
Credited to Universiti Malaysia Sabah
Challenges for Life Science – Information Model
Designed for human (90%+), not for computer Copy Right, Sheng-Chuan Wu July 2009
8
Challenges for Life Science – Knowledge Representation
Many sources (silos) of life science information Our understanding in some areas (e.g., pathways)
is very limited and uncertain We don’t even know what else to come A mammoth A mammoth data integration data integration problem, problem,
let alone integrated understanding & let alone integrated understanding & knowledge discoveryknowledge discovery
Try to design a schema for such data Try to design a schema for such data tables and knowledge warehouses !!tables and knowledge warehouses !!
Copy Right, Sheng-Chuan Wu July 2009
Challenges for Life Science – Integration
Same Challenges for Every Field in Biology
Many diverse but related subjects Different taxonomies from different disciplines Very complex information model, which must
evolve constantly as we learn more Difficulty in knowledge representation, for
computer not just for human Mammoth information integration problem
Copy Right, Sheng-Chuan Wu July 2009
10
Semantic Technology can help overcome
these challenges
Semantic Technology can help overcome
these challenges
What is Semantic Technology
A new way of representing (modeling) and accessing information
Make unstructured documents without semantic (on the web or not) computer intelligible
Key enabling standards: URI, RDF, RDFS, OWL and SPARQL
Copy Right, Sheng-Chuan Wu July 2009
11
“The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.[Tim Berners-Lee et al , 2001] ”
“The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.[Tim Berners-Lee et al , 2001] ”
The Semantic Wave is NOT One Thing … Two Differing Major Waves Within It
The Semantic Web Information sharing on a global scale Intranets vs. Internet
Semantic Technology Standard knowledge representation Enhanced knowledge access and discovery Semantic Interoperability Information syndication and so forth
Copy Right, Sheng-Chuan Wu July 2009
First, We Need Shared Reference
“Family“
ReferentFormStands for
Relates toactivates
Concept
Computer cannot help without clear data semantics
Computer cannot help without clear data semantics
Copy Right, Sheng-Chuan Wu July 2009
?
URI (from W3C) Removes Ambiguity
Semantic languages (RDF)To describe mappings, relations (properties) and structures of data
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
<>
<>
<>
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
<>
<>
<>
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
<>
<>
<>
CV
name
education
work
private
< >
< >
< >
< >
< >
< CV >
<name >
<educ >
<>
< >
CV
name
education
work
private
< >
< >
< >
< >
< >
< CV >
<name >
<educ >
<>
< >
CV
name
education
work
private
< >
< >
< >
< >
< >
< CV >
<name >
<educ >
<>
< >
CV
name
education
work
private
< >
< >
< >
< >
< >
< >
< >
< >
< ‹›„⁄ >
CV
name
education
work
private
< >
< >
< >
< >
< >
< CV >
<name >
<educ >
<>
< >
W3C Ontology (RDF/RDFS/OWL)Provide external common references, structure and property for individual terms
Copy Right, Sheng-Chuan Wu July 2009
Universal Resource Identifier (URI) as common reference
<http://http://xmlns.com/foaf/0.1/Thing.owl #Family>
<http://www.usda.gov/classification/plants/taxaonomy.owl#Family>
15 IDD Workshop 2008 | Semantic Web in Life Sciencesl | Martin Romacker, Therese Vachon| 30. Sep .2008
RDF (Resource Description Framework) from W3C is a triplet consisting of a uniform structure:
[subject] [predicate] [object]Virus_infection caused_by virus
This structure is close to simple phrases in natural language. With URI, generates a graph (easy to visualize). All are represented as unique resources (URI).
Virus_infection VirusCaused_by
AIDS
rdf:type
HIV_Virus
rdf:type
RDF – Distributed Data (Knowledge)
Adding a new relation requires a change of the DB schema
DOM1913
Table friendsID1 ID22 35
One-to-Many Relational Model – Many Table Joints
Copy Right, Sheng-Chuan Wu July 2009
17
Equivalent Semantic Model – Easy with URI & RDF<triple 32: "person2" "type" "person"><triple 33: "person2" "first-name" "Rose"><triple 34: "person2" "middle-initial" "Elizabeth"><triple 35: "person2" "last-name" "Fitzgerald"><triple 36: "person2" "suffix" "none"><triple 37: "person2" "alma-mater" "Sacred-Heart-Convent"><triple 38: "person2" "birth-year" "1890"><triple 39: "person2" "death-year" "1995"><triple 40: "person2" "sex" "female"><triple 41: "person2" "spouse" "person1"><triple 58: "person2" "has-child" "person17"><triple 56: "person2" "has-child" "person15"><triple 54: "person2" "has-child" "person13"><triple 52: "person2" "has-child" "person11"><triple 50: "person2" "has-child" "person9"><triple 48: "person2" "has-child" "person7"><triple 46: "person2" "has-child" "person6"><triple 44: "person2" "has-child" "person4"><triple 42: "person2" "has-child" "person3"><triple 60: "person2" "profession" "home-maker">
Copy Right, Sheng-Chuan Wu July 2009
<triple 66: "person2" "has-friend" "person35"><triple 67: "person2" “year-of-marriage" “1913">
A Little Ontology Goes A Long WayTaxonomic Structure (ontology)
Copy Right, Sheng-Chuan Wu November 2008
Copy Right, Sheng-Chuan Wu July 2009
19
A Much Better Model with Semantic
Synergies in Knowledge RepresentationBiomedical Taxonomy/ Ontologies
RDF Class Hierarchy Maps Taxonomy
Copy Right, Sheng-Chuan Wu July 2009
21
WhiteLadySlipper type OrchidFamilyOrchidFamily subClassOf Liliidae
rdf:type and rdfs:subClassOf are from W3C standard, and the relationship is transitive
Relationship Model
WhiteLadySlipper type Liliidae
Information Inferred
Question
In which subclass does WhiteLadySlipper belong?
Answer
Liliidae
Information Given
Relationships are explicit in the model and directly available to applications!
Where are the relationships?
Semantic Model – Explicit Relationship
July 2009
ID Species NameIDC WhiteLadySlipper
Family Subclass OrchidFamily Liliidae
FM_ID SP_ID OrchidFamily IDC
Species Table
SubclassTable
Species_FamilyTable
Question
In which Subclass is WhiteLadySlipper located?
Answer
In Liliidae
Develop a Query
Select Subclass
From Species_Table, SubclassTable, Species_Family Table
Where Sepices_Name = “WhiteLadySlipper” and ID = SP_ID and Family = FM_ID
Relationships are in documents, SQL code and collective memories - not available to applications!
Where are the relationships?
Data Definition Statements? Applications do not use them, they are not descriptive and their scope is a single database
Data Dictionary? Data Registry? They are for human, not computer use
Relational Model – Implicit Relationship
July 2009
WhiteLadySlipper type OrchidFamilyOrchidFamily subClassOf OrchidalesOrchidales subClassOf Liliidae
type and subClassOf are from W3C RDFS standard, and the relationships are trasitive
Relationship Model
WhiteLadySlipper type OrchidalesWhiteLadySlipper type Liliidae
Information automatically Inferred
QuestionIn which subclass does WhiteLadySlipper belong?
AnswerLiliidae
Information Given
Semantic Model – Explicit Relationship
July 2009
OrchidFamily subClassOf Liliidae
new data
Changes are Easy to Make
Changes are Easy to Make
ID Species NameIDC WhiteLadySlipper
Order Name ID Subclass Orchidales ORD Liliidae
FM_ID SP_ID OrchidFamily IDC
Family_ID Order_ID OrchidFamily ORD
Species Table
Order Table
Species_Family Table
Family_Order Table
Doesn’t workany more!
Relational Model Changes at Great Peril
July 2009
ID Species NameIDC WhiteLadySlipper
Family Subclass OrchidFamily Liliidae
FM_ID SP_ID OrchidFamily IDC
Species Table
SubclassTable
Species_FamilyTable
Develop a Query
Select Subclass
From Species_Table, SubclassTable, Species_Family Table
Where Sepices_Name = “WhiteLadySlipper” and ID = SP_ID and Family = FM_ID
Question
In which Subclass is WhiteLadySlipper located?
??
Get No Answer!
Changes should be avoided at ALL costs
Changes should be avoided at ALL costs
26
Changing Taxonomy Affects Only the ViewView on the Species
rdf:type
rdfs:subClassOf
Copy Right, Sheng-Chuan Wu November 2008
Designed for human (90%+), not for computer Copy Right, Sheng-Chuan Wu July 2009
27
How About Unstructured Documents with No Semantic?
Find me a Thai restaurant that is Halal, not too expensive, no alcohol served, near Jorong Point Shopping Centre in Singapore.
Find me a Thai restaurant that is Halal, not too expensive, no alcohol served, near Jorong Point Shopping Centre in Singapore.
Turn Document into Semantic Model
Copy Right, Sheng-Chuan Wu July 2009
28
http://www.zabih
ah.com/ds.php?id=1716
Restaurant
true
halalAuth
Thai
cuisine
Indonesi
an
cuisine
Med
ian
price
“1 ju
rong
w
est
cent
ral 2
”
addr
ess
Sin
gapo
r
e
city
Sin
gapo
r
e
coun
try
Fals
e
alco
holS
erve
d “65- 6792- 6593”
phone
true
halal
type
Jurong Point
address
Best keyword search engine gives very unsatisfactory results
Intelligent, complex, ad hoc query, beyond keyword search, now possible
Ad Hoc Query with SPARQLSPARQL – Biological Processes in Dendrites
Copy Right, Sheng-Chuan Wu July 2009
29
Alzheimer’s disease is characterized by neural degeneration.
Among other things, there is damage to dendrites and axons, parts of nerve cells.
What resources do we have available to learn more about biological processes in dendrites?
Query Gene Ontology (GO) for Clues
Copy Right, Sheng-Chuan Wu July 2009
30
Inference at work
Looking for Alzheimer Disease Targets
Signal transduction pathways are considered to be rich in “druggable” targets - proteins that might respond to chemical therapy
CA1 Pyramidal Neurons are known to be particularly damaged in Alzheimer’s disease.
Can we find candidate genes known to be involved in signal transduction and active in Pyramidal Neurons?
Copy Right, Sheng-Chuan Wu July 2009
31
Scientific Inquiry over Many Sources
Copy Right, Sheng-Chuan Wu July 2009
32
A SPARQL Query Spanning 4 Sources
Copy Right, Sheng-Chuan Wu July 2009
33
Ad hoc queries over multi data sources (in RDF) easy
Ad hoc queries over multi data sources (in RDF) easy
Finally, Semantic TechnologyA Different but Better Mouse Trap Database w/o schema, nor table; change easily Distribution and integration of data easy and
seamless with URI and RDF Separation of taxonomic information and
individual data Ontology to bridge different taxonomies A query language (SPARQL) for ad hoc pattern
matching Ideal for modeling & accessing life science dataIdeal for modeling & accessing life science data
Copy Right, Sheng-Chuan Wu July 2009
34
Semantic Technology gives us
an integrated view of available knowledge
Semantic Technology gives us
an integrated view of available knowledge
Potential Applications
Bridging natural herbal medicine and western medical knowledge
Fresh water fishery (environmental) management
Plant biotechnology, Precision Agriculture Biodiversity repository All based on a single framework to model, All based on a single framework to model,
integrate and access different life science integrate and access different life science “knowledge” sources“knowledge” sources
Copy Right, Sheng-Chuan Wu July 2009
35
Semantic TechnologySemantic Technologyforfor
Life ScienceLife Science
Technology for a healthier life