UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 1
Finding knowledge, data and answers on
the Semantic WebTim Finin
University of Maryland, Baltimore Countyhttp://ebiquity.umbc.edu/resource/html/id/202/
Joint work with Li Ding, Anupam Joshi, Yun Peng, Cynthia Parr, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi
http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and
HP.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 2
This talk• Motivation• Swoogle Semantic Web
search engine• Use cases and applications• Observations• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 3
Google has made us smarter
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 4
But what about our agents?
tell
register
Agents still have a very minimal understanding of text and images.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 5
But what about our agents?
A Google for knowledge on the Semantic Web is needed by software agents and programs
SwoogleSwoogle
Swoogle
Swoogle
SwoogleSwoogle
SwoogleSwoogle
Swoogle SwoogleSwoogle
SwoogleSwoogle
SwoogleSwoogle
tell
register
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 6
This talk• Motivation• Swoogle Semantic Web
search engine• Use cases and applications• Observations• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 7
•http://swoogle.umbc.edu/•Running since summer 2004•1.8M RDF docs, 320M triples, 10K
ontologies,15K namespaces, 1.3M classes, 175K properties, 43M instances, 600 registered users
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 8
Analysis
Index
Discovery
IR IndexerSearch Services
Semantic Webmetadata
Web Service
Web Server
Candidate URLs
Bounded Web CrawlerGoogle Crawler
SwoogleBot
SWD Indexer
Ranking
document cache
SWD classifier
human machine
html rdf/xml
…
the WebSemantic Web
Information flow Swoogle‘s web interface
Legends
Swoogle Architecture
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 12
This talk• Motivation• Swoogle Semantic Web
search engine• Use cases and applications• Observations• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 13
Applications and use casesSupporting Semantic Web developers
– Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors, statistics, etc.
Searching specialized collections– Spire: aggregating observations and data from biologists– InferenceWeb: searching over and enhancing proofs– SemNews: Text Meaning of news stories
Supporting SW tools– Triple shop: finding data for SPARQL queries
1
2
3
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 14
1
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 15
By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size.
80 ontologies were found that had these three terms
Let’s look at this one
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 16
Basic MetadatahasDateDiscovered: 2005-01-17 hasDatePing: 2006-03-21 hasPingState: PingModified type: SemanticWebDocument isEmbedded: false hasGrammar: RDFXML hasParseState: ParseSuccess hasDateLastmodified: 2005-04-29 hasDateCache: 2006-03-21 hasEncoding: ISO-8859-1 hasLength: 18K hasCntTriple: 311.00 hasOntoRatio: 0.98 hasCntSwt: 94.00 hasCntSwtDef: 72.00 hasCntInstance: 8.00
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 17
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 18
rdfs:range was used 41 times to assert a value.
owl:ObjectProperty was instantiated 28 times
time:Cal… defined once and used 24 times (e.g., as range)
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 19
These are the namespaces this ontology uses. Clicking on one
shows all of the documents using the namespace.
All of this is available in RDF form for the
agents among us.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 20
Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 21
We can also search for terms (classes, properties) like terms for “person”.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 22
10K terms associated with “person”! Ordered by use.
Let’s look at foaf:Person’s metadata
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 23
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 24
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 25
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 26
87K documents used foaf:gender with a foaf:Person instance as the subject
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 27
3K documents used dc:creator with a foaf:Person instance as the object
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 28
Swoogle’s archive saves every version of a SWD it’s seen.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 29
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 30
2
An NSF ITR collaborative project with•University of Maryland, Baltimore County •University of Maryland, College Park•U. Of California, Davis•Rocky Mountain Biological Laboratory
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 31
An invasive species scenario• Nile Tilapia fish have been found in a California lake.
• Can this invasive species thrive in this environment?• If so, what will be the likely
consequences for theecology?
• So…we need to understandthe effects of introducingthis fish into the food webof a typical California lake
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 32
Food Webs• A food web models the trophic (feeding)
relationships between organisms in an ecology– Food web simulators are used to explore the
consequences of changes in the ecology, such as the introduction or removal of a species
– A locations food web is usually constructed from studies of the frequencies of the species found there and the known trophic relations among them.
• Goal: automatically construct a food web for a new location using existing data and knowledge
• ELVIS: Ecosystem Location Visualization and Information System
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 33
East River Valley Trophic Web
http://www.foodwebs.org/
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 34
Species List ConstructorClick a county, get a species list
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 35
The problem• We have data on what species are known to be in
the location and can further restrict and fill in with other ecological models
• But we don’t know which of these the Nile Tilapia eats of who might eat it.
• We can reason from taxonomic data (simlar species) and known natural history data (size, mass, habitat, etc.) to fill in the gaps.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 36
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 37
Food Web ConstructorPredict food web links using database and taxonomic reasoning.
In an new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 38
Evidence ProviderExamine evidence for predicted links.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 39
Status• Goal is ELVIS (Ecosystem Location Visualization and
Information System) as an integrated set of web services for constructing food webs for a given location.
• Background ontologies– SpireEcoConcepts: concepts and properties to represent food
webs, and ELVIS related tasks, inputs and outputs– ETHAN (Evolutionary Trees and Natural History) Concepts and
properties for ‘natural history’ information on species derived from data in the Animal diversity web and other taxonomic sources
• Under development– Connect to visualization software– Connect to triple shop to discover more data
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 40
UMBC Triple Shop• http://sparql.cs.umbc.edu/• Online SPARQL RDF query processing with several
interesting features• Automatically finds SWDs for give queries using
Swoogle backend database• Datasets, queries and results can be saved, tagged,
annotated, shared, searched for, etc.• RDF datasets as first class objects
– Can be stored on our server or downloaded– Can be materialized in a database or
(soon) as a Jena model
3
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 41
Web-scale semantic web data access
agent data access service the Web
ask (“person”)Search vocabulary
ask (“?x rdf:type foaf:Person”)
inform (“foaf:Person”)
Fetch docs
Populate RDF database
Query localRDF database
inform (doc URLs)
Search URIrefs in SW vocabulary
Search URLsin SWD index
Compose query
Index RDF data
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 42
Who knows Anupam Joshi?Show me their names, email address and pictures
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 43
The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 44
No FROM clause!
PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT DISTINCT ?p2name ?p2mbox ?p2pixFROM ???WHERE { ?p1 foaf:surname "Joshi" . ?p1 foaf:firstName “Anupam" . ?p1 foaf:mbox ?p1mbox . ?p2 foaf:knows ?p3 . ?p3 foaf:mbox ?p1mbox . ?p2 foaf:name ?p2name . ?p2 foaf:mbox ?p2mbox . OPTIONAL { ?p2 foaf:depiction ?p2pix } . }ORDER BY ?p2name
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 45
Enter query w/oFROM clause!
log in
specify dataset
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 46
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 47
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 48
302 RDF documents were found that might have useful data.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 49
We’ll select them all and add them to the current dataset.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 50
We’ll run the query against this dataset to see if the results are as expected.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 51
The results can be produced in any of several formats
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 52
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 53
Looks like a useful dataset. Let’s save it and also materialize it the TS triple store.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 54
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 55
We can also annotate, save and share queries.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 56
Work in Progress• There are a host of performance issues• We plan on supporting some special datasets, e.g.,
– FOAF data collected from Swoogle– Definitions of RDF and OWL classes and properties from all
ontologies that Swoogle has discovered• Expanding constraints to select candidate SWDs to include
arbitrary metadata and embedded queries– FROM “documents trusted by a member of the SPIRE
project”• We will explore two models for making this useful
– As a downloadable application for client machines– As an (open source?) downloadable service for servers
supporting a community of users.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 57
This talk• Motivation• Swoogle Semantic Web
search engine• Use cases and applications• Observations• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 58
Will Swoogle Scale? How?Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling
System/date Terms Documents Individuals Triples Bytes
Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109
Swoogle3 2x105 7x105 1.5x107 7.5x107 1x1010
2006 1x106 5x107 5x107 5x109 5x1011
2008 5x106 5x109 5x109 5x1011 5x1013
We think Swoogle’s centralized approach can be made to work for the next few years if not longer.
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 59
How much reasoning should Swoogle do?
• SwoogleN (N<=3) does limited reasoning– It’s expensive– It’s not clear how much should be done
• More reasoning would benefit many use cases– e.g., type hierarchy
• Recognizing specialized metadata– E.g., that ontology A some maps terms from B to C
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 60
A RDF Dictionary• We hope to develop an RDF dictionary.• Given an RDF term, returns a graph of its
definiton– Term definition from “official” ontology– Term+URL definition from SWD at URL– Term+* union definition– Optional argument recursively adds definitions of terms
in definition excluding RDFS and OWL terms– Optional arguments identifies more namespaces to
exclude
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 61
This talk• Motivation• Swoogle Semantic Web
search engine• Use cases and applications• Observations• Conclusions
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 62
Conclusion• The web will contain the world’s knowledge in
forms accessible to people and computers– We need better ways to discover, index, search and
reason over SW knowledge• SW search engines address different tasks than
html search engines– So they require different techniques and APIs
• Swoogle like systems can help create consensus ontologies and foster best practices– Swoogle is for Semantic Web 1.0– Semantic Web 2.0 will make different demands
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 63
http://ebiquity.umbc.edu/Annotated
in OWL
For more information