Post on 03-Jan-2016
transcript
Streaming Streaming Knowledge BasesKnowledge Bases
Onkar Walavalkar, Anupam JoshiOnkar Walavalkar, Anupam JoshiTim Finin and Yelena YeshaTim Finin and Yelena Yesha
University of Maryland, Baltimore CountyUniversity of Maryland, Baltimore County27 October 200827 October 2008
Streaming Streaming Knowledge BasesKnowledge Bases
Onkar Walavalkar, Anupam JoshiOnkar Walavalkar, Anupam JoshiTim Finin and Yelena YeshaTim Finin and Yelena Yesha
University of Maryland, Baltimore CountyUniversity of Maryland, Baltimore County27 October 200827 October 2008
Streaming Streaming Knowledge BasesKnowledge Bases
Onkar Walavalkar, Anupam JoshiOnkar Walavalkar, Anupam JoshiTim Finin and Yelena YeshaTim Finin and Yelena Yesha
University of Maryland, Baltimore CountyUniversity of Maryland, Baltimore County27 October 200827 October 2008
Overview
• Motivation• Streaming databases• Streaming knowledge bases• Experiments and results• Conclusions
Motivation Stream DBs Stream KBs Experiments Conclusions
Operating Room of the Future
• ORs will be awash in low-level data, much of it noisy or incomplete• Challenges include coping with the noise and interpreting the low-
level data to recognize high-level events and activities
ORF
drugs
patient Monitors
staff
tools
RFID
AwarePoint
RFID
RFID
Bluetooth
Bluetooth
WIFI
WIFI
devices
Motivation Stream DBs Stream KBs Experiments Conclusions
Initial work in OR training
• UMD Mastri Center is experimenting with OR technologies and training environments
• The Human Patient Simulator from METI– Designed to react like a human– Responds to medical treatment
• Generates continuous streams of data, moderated by– Initial conditions (e.g. blunt trauma multiple injuries scenario)– human interactions
Motivation Stream DBs Stream KBs Experiments Conclusions
Efficient Data Stream Management
• Data is stored/indexed in system• Queries applied to stored data as they
“stream through”
Queries
Ind
ex
Results
Dat
a
Query
Index
Results
Data
Traditional DBMS Stream Management System• Queries stored/indexed in system• Data applied to stored queries as they
“stream through”
Several efforts: Tapestry, Aurora, TelegraphCQ Motivation Stream DBs Stream KBs Experiments Conclusions
Stream Processor(TelegraphCQ)
ContinuousQueries
Patient Monitor
RFID System
MedicinesTools
Staff
Trend Analyzer Physiological
Data
Low-LevelEvent Processor
Database
Patient History
Medical Supplies
Staff
Rule Base
Assert facts
MedicalEncounterRecord
Video Clipper
Assert facts
Event Detection - Level 3
Event Detection - Level 2
Event Detection - Level 1
Events
Events
Motivation Stream DBs Stream KBs Experiments Conclusions
What’s wrong with this picture?
• We need to enhance this to support semantic interoperability for medical data & knowledge
• The medial community has a long history developing & using standard ontologies & metadata
• Incoming streams of data can be in rdf• And reference terms in appropriate ontologies
Motivation Stream DBs Stream KBs Experiments Conclusions
What’s wrong with this picture?
• Streaming Database systems use continuous queries specified over a sliding time window– e.g., [range by ‘30 seconds’ slide by ‘10
seconds’]• Issues:
– Where do we we do reasoning?– How do we answer queries against a sliding
window of data?
Motivation Stream DBs Stream KBs Experiments Conclusions
RDF Stream Processing
Static Data Store
RangeInfo
PropertyTree
DomainInfo
InverseInfo
Classtree
input streamhandler
Special domainrules & queries
Input Triple Stream
Enhanced Stream
Query for Class of Concern
Detected Instances
Motivation Stream DBs Stream KBs Experiments Conclusions
Experiments and results
• Three simple reasoners– Jena, in core– Pre-computed custom hash tables– Using tables in TelegraphCQ
• Various scenarios– Ontology size: 118 - 23.1 MB– Number of subclasses: 49 - 57,000– Subclass depth: 2 - 9– Data rate: 1 - 50 triples per second
Domain Example
• Monitor data stream looking for observations of invasive species from Bioblitz and eco-blogging data streams
• Uses our Ethan ontologies for ecoinformatics• Tree of life (~340K taxons from ITIS and other sources)• Species profiles• Invasive species definitions• Observation
Conclusions
• If the incoming triple data rate goes beyond a certain limit, the reasoning speed starts to lag and tends to slow down the incoming stream.
• The speedup achieved by using TCQ and a hashtable prove the value of pre-processing an ontology, particularly for fast streaming facts.