+ All Categories
Transcript

Advanced Database Management SystemsData Stream Management

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 1 / 115

Outline

Data Streams Defined and Motivated

Data Streams v. Stored Data

Windows on Data Streams

Query Syntax and Semantics in DSMSs

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 2 / 115

Data Streams Defined and Motivated

Data Streams (1)What are they?

I A stream is a continuous, potentially unbounded, potentiallyvoluminous, often real-time, sequence of data elements (e.g., tuples).

I Often, an item in a stream can be seen as the notification that anevent has occurred.

I There two major kinds of sources, viz.:I transactional streams, in which case an item conveys a notification

that an interaction between entities has taken place;I monitoring streams, in which case an item conveys a notification that

some entity has changed (i.e., that its state has evolved).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 3 / 115

Data Streams Defined and Motivated

Data Streams (2)Why do they matter? (1)

Examples of transactional streams include:

I credit card purchases by consumers from merchants

I phone calls by callers to dialled parties

I web accesses by client of resources held by servers

I inter-organizational interactions (e.g., purchase of supplies, delivery ofgood, payment for services provided, etc.)

I intra-organizational interactions (e.g., movement of parts fromwarehouses to production lines, from production lines to deliveryvehicles, from vehicles to loading bays, from loading bays to shelves,etc.)

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 4 / 115

Data Streams Defined and Motivated

Data Streams (3)Why do they matter? (2)

Examples of monitoring streams include:

I price movements in financial and commodity markets

I traffic levels in networks

I physical parameters (such as temperature, pressure, etc.) of physicalphenomena (such as oceans, the atmosphere, etc.)

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 5 / 115

Data Streams v. Stored Data

Data Stream v. Data Management Systems (1)Contrasts (1)

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 6 / 115

Data Streams v. Stored Data

Data Stream v. Data Management Systems (2)Contrasts (2)

DBMSs DSMSs

Data Persistent TransientTuple set/bag Tuple sequenceBounded cardinality Unbounded cardinality

Updates Explicit ImplicitModify in place Append only

Access Random SequentialMulti-pass One-pass only

Queries One-off, transient Continuous, persistentExact answer Approximate answer

Execution Pro-active ReactivePulled data Pushed dataFixed plan Adaptive plan

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 7 / 115

Data Streams v. Stored Data

Data Stream v. Data Management Systems (3)Challenges Arising

I Operator arguments have unbounded cardinality, so one cannot hopeto scan them in their entirety.

I Blocking operators (like sort, join, duplicate removal, etc.) have nodefined semantics over unbounded streams.

I Data is pushed onto the system, so the data stream managementsystem (DSMS) must keep state to receive arriving data.

I If arrival rates are higher than the achievable throughput, the DSMSmay not be able to keep up.

I Since many queries are registered for continuous (or periodic)evaluation, sharing execution load is possible and beneficial.

I Since queries are long-lived, query execution plans (QEPs) mustadapt over their lifetime.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 8 / 115

Data Streams v. Stored Data

Data Stream v. Data Management Systems (4)Responses

I A window over a stream acts as a stream-to-relation conversion operator: itgenerates a bounded region in the stream, thereby allowing blockingoperators to retain their classical semantics.

I Sliding such a window allows operators to produce/revise answers at eachwindow slide.

I Because of asynchronous arrival, operators interact via queues, which bindoperators with one another and with source streams.

I If the DSMS is not be able to keep up, tuples must be shed according tosome policy.

I Multi-query optimization techniques are used to share work (e.g., whenthere are several QEPs in the query store, they may share subplans).

I Adaptive query processing techniques (e.g., special, adaptive operators) areused to respond to changing conditions.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 9 / 115

Windows on Data Streams

Windows (1)Why? What kinds? (1)

I A window is a mechanism to superimpose a region of definitecardinality over a stream whose overall cardinality is unknown.

I The main kinds of windows are:

time-based : keep the items that have arrived in the last k timeunits

count-based : keep the last k items to have arrivedpunctuated : keep all items between an opening marker k up to a

closing marker k ′.I Punctuated windows are particularly useful for streams of un- or

semistructured data, like text or XML documents.I Time-based windows are particularly useful for structured data in

transactional or in monitoring streams.I Count-based windows are useful irrespective of the degree of structure

in the data.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 10 / 115

Windows on Data Streams

Windows (2)How?

I A window, by default, changes as data arrives and the operators that use itre-evaluate as it does so.

I Time- and count-based windows have specified scope in terms of a numberof units, e.g., time units or tuples.

I It is often convenient, in these cases, to specify a slide, i.e., a number ofunits (either time units or tuples) that must come to pass (or arrive) totrigger re-evaluation.

I As time passes and more items arrive, some items in the window will beremoved because they have fallen out of the window scope.

I Thus, at every change or slide, the conditions that have thus far justified theinclusion of some items in the window may now have become invalid.

I If so, these items are said to have expired and are removed.

I Continuous query evaluation takes into account valid items only (i.e., thosethat have not expired yet).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 11 / 115

Query Syntax and Semantics in DSMSs

Query Syntax in DSMSs (1)Language Extensions (1)

I The simplest language extensions make use of SQL OLAP syntax.

I In the FROM clause, if a name denotes a stream, then it may have awindow specification placed upon it.

I Time-based windows are specified as having a given RANGE, i.e., acertain width in time units.

I Count-based windows are specified as holding a given number ofROWs.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 12 / 115

Query Syntax and Semantics in DSMSs

Query Syntax in DSMSs (2)Language Extensions (2)

Example

I Every minute, report items with classhigher than A whose price haschanged in the last two minutes.

SELECT *

FROM Prices

[RANGE 2 min SLIDE 1 min]

WHERE class > "a"

I After every price change, report howmany items with class higher than Aappeared in the last three changes.

SELECT COUNT(*)

FROM Prices [ROWS 3 SLIDE 1]

WHERE class > "a"

I Whenever new information arrives inS1 or in S2, report those items whoseprices have coincided within the lastthree minutes.

SELECT *

FROM S1 [RANGE 3 min],

S2 [RANGE 3 min]

WHERE S1.price = S2.price

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 13 / 115

Query Syntax and Semantics in DSMSs

Query Semantics in DSMSs (1)Contrasts

I While the syntax may be SQL-like, the semantics is rather different.

I In a stream QL, tuples have an ordering attribute (e.g., a timestamp,often implicitly), but not in SQL.

I In SQL, the answer is a table; in a stream QL, it is a stream.

I Intuitively, the answer to a stream query is the answer of thecorresponding SQL query (i.e., with the window specificationsremoved) over the current state of the input streams/windows.

I This means that the answer changes over time as the windows slideforward, e.g., whenever a new tuple arrives or an old tuple leaves thewindow.

I To represent additions and deletions from the result, positive andnegative tuples can be generated, or else the answer can berecomputed from scratch on every change.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 14 / 115

Query Syntax and Semantics in DSMSs

Query Semantics in DSMSs (2)Some Examples (1)

Example

SELECT *

FROM S [RANGE 2 min SLIDE 1 min]

WHERE class > "a"

SLIDE 1 min Every minute

FROM S take the S stream, consider only those tuples

RANGE 2 min that were timestamped in the last two minutes and

WHERE class > "a" that have class above ”a”,

SELECT * report them.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 15 / 115

Query Syntax and Semantics in DSMSs

Query Semantics in DSMSs (3)Some Examples (2)

Example

SELECT *

FROM S1 [RANGE 3 min],

S2 [RANGE 3 min]

WHERE S1.price = S2.price

- Whenever data arrives in S1 or S2,

FROM S1 take the S1 stream, consider only those tuples

RANGE 3 min that were timestamped in the last three minutes,

S2 take the S2 stream, consider only those tuples

RANGE 3 min that were timestamped in the last three minutes,

- form the Cartesian product of the S1 and S2 tuples that are in scope,

WHERE S1.price = S2.price keep those who have identical prices,

SELECT * report them.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 16 / 115

Query Syntax and Semantics in DSMSs

Query Semantics in DSMSs (4)Two Examples

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 17 / 115

Query Syntax and Semantics in DSMSs

Query Semantics in DSMSs (5)Selection, Projection

I Selection is non-blocking, so there is no need for windows.

I Likewise, for projection under bag semantics (retaining the orderingattribute).

I Under set semantics, duplicate removal requires that we superimposea window on the stream.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 18 / 115

Query Syntax and Semantics in DSMSs

Query Semantics in DSMSs (6)Joins

I The simplest is a binary join over sliding windows.

I It is inspired by symmetric hash join.

I The informal semantics of a sliding window join between two streamsS1 and S2 is as follows.

I When a new tuple arrives in one of the operands, say S1:

1. Scan the window on S2 to find any matching tuples and propagate theconcatenations into the answer.

2. Insert the new arrival in the window on S1.3. Invalidate all the tuples in the window on S1 that have expired as a

consequence.

I The process is symmetrical when a new tuple arrives in the otheroperand.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 19 / 115

Query Syntax and Semantics in DSMSs

Query Semantics in DSMSs (7)A Time-Based Window Join

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 20 / 115

Query Syntax and Semantics in DSMSs

Query Semantics in DSMSs (8)Aggregation

I Distributive aggregation functions (e.g., COUNT, SUM, MAX, MIN) onlyrequire that we hold on to the last answer we emitted and update itat each new arrival before emitting the new answer.

I Algebraic aggregation functions (e.g., AVG) require that we hold on tothe terms (e.g., COUNT, SUM) used to compute the last answer weemitted and update them at each new arrival before computing andemitting the new answer.

I Holistic aggregation functions (e.g., MEDIAN, COUNT DISTINCT) needto see all the values and hence require that we superimpose a windowon the stream.

I Group-by aggregation can, as usual, be done using hash-basedtechniques to hold the partitions, in this case updating them for newarrivals works much as has been described for binary join.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 21 / 115

Query Syntax and Semantics in DSMSs

SummaryData Stream Management

I Data stream management is a growth area for the deployment ofdatabase technology.

I Many modern organizations have as part of their competitive strategythe ability to respond timely to external events.

I Data stream management systems are very well placed to perform thekind of complex event processing that such organizations require.

I However, the challenges posed by data streams to classical DBMStechnology are unprecedented, ranging from foundational issues,through query semantics and optimization, to adaptive queryprocessing.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 22 / 115

Query Syntax and Semantics in DSMSs

Advanced Database Management SystemsData Stream Query Processing

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 23 / 115

Outline

Query Optimization in DSMSs

Example DSMSs

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 24 / 115

Query Optimization in DSMSs

Query Execution in DSMSsA Typical Picture

I When a continuous query Qn isregistered, a QEP Pn is generated for Qn.

I The new plan is merged with thecollection of existing plans P1, . . . ,Pn−1.

I At any point in time, the registeredqueries form a graph: individual queriesshare inputs and outputs.

I The collection of QEPs comprises:

I OperatorsI Queues, both input and

inter-operator onesI State (e.g., windows, previous

results, etc.)

I A global scheduler oversees whichoperators evaluate in response to what.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 25 / 115

Query Optimization in DSMSs

Query Optimization in DSMSs (1)General Framework

I The general idea is still to generate candidate query plans by rewriting(e.g., selections and time-based windows commute, but selections andcount-based windows do not).

I While we still want to reduce sizes, since operators keep state,evaluation is in main-memory mostly, so disk I/O not as major a costconcern.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 26 / 115

Query Optimization in DSMSs

Query Optimization in DSMSs (2)Issues Regarding Multi-Query Execution

I In a DBMS, a query is issued and runs as if in isolation; in a DSMS,many queries are likely to be executing together for potentially longperiods at any one time.

I If so, there are opportunities for sharing, e.g.:I Same SELECT and WHERE clauses but different window scope in the

FROM clauses.I Same SELECT and FROM clauses but different predicates in the

WHERE clauses.

I It is also possible, e.g., to generate indexes from a list of predicatesthat are active and, when a new tuple t arrives, find which predicatesneed to be evaluated over t, thereby allowing the scheduler controlover which queries and operators to trigger.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 27 / 115

Query Optimization in DSMSs

Query Optimization in DSMSs (3)Issues Regarding Operator Scheduling

I Possible overall scheduling strategies include:I Many tuples at a time: each operator gets a time-slice and the tuples

in its input queue(s).I Many operators at a time: each tuple is processed by all the operators

in a path in pipelined fashion.

I The choice of scheduling strategy depends upon the optimizationgoal:

I If minimize end-to-end latency, then a tuple should take the leastamount of time possible from its arrival to being reflected in the result.

I If maximize tuple output rate for the query, then, given an arrival rateand two operators that are neighbours in a pipeline and commute, ifthey have the same selectivity, the one with faster output rate shouldexecute earlier.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 28 / 115

Query Optimization in DSMSs

Query Optimization in DSMSs (4)Issues Regarding Adaptivity (1)

I System conditions can change throughout the lifetime of a persistentquery, e.g.:

I The overall workload can change as the QEP collection changes.I Stream arrival rates can change, e.g., from fast to slow, from steady to

bursty, etc.

I One option is to adapt the plan on-the fly, e.g., change from asymmetric to an asymmetric binary join strategy, i.e., one in which adifferent join algorithm is used for arrival in one stream and the other.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 29 / 115

Query Optimization in DSMSs

Query Optimization in DSMSs (5)Issues Regarding Adaptivity (2)

I Another is to use adaptive operators from the start, e.g., collapse ajoin sequence into a single operator (known as aneddy [Avnur and Hellerstein, 2000]) and thread each tuple throughall the joins but decide the route dynamically, in response to theobserved output rate in each join.

I Eddies implement dynamically changing join ordering strategies.

I In doing so, they free the query optimizer from having to worry aboutjoin ordering (based, e.g., on the greedy algorithm we have studiedearlier).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 30 / 115

Query Optimization in DSMSs

Query Optimization in DSMSs (6)Issues Regarding Load Shedding (1)

I When the system is overwhelmed because the scheduler cannotachieve tuple output rates that match or exceed the tuple arrival ratesbeing experienced, there is a need for strategies to shed load.

I These include:I Randomly dropping a fraction of arriving tuples: for monitoring

streams, if sampling is acceptable, this may be sound.I Examining the contents of a tuple before deciding whether or not to

drop it, on the assumption that some tuples may have more value thanothers (e.g., in detection contexts, a single, possibly rare, event isvalued more highly than a commonly-occurring event that onlyconfirms normality).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 31 / 115

Query Optimization in DSMSs

Query Optimization in DSMSs (7)Issues Regarding Load Shedding (2)

I Rather than dropping tuples, we can also:I Spill them over to disk and pick them up for processing during quieter

times;I Narrow the scope of the windows, perhaps progressively.

I One guiding optimization goal in load shedding is to minimize theimpact on accuracy or on the approximation error.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 32 / 115

Example DSMSs

Data Stream Management Systems (1)Aurora/Borealis

I Aurora [Abadi et al., 2003] is geared towards monitoring applications(streams, triggers, imprecise data, real time requirements).

I Rather than as declarative queries, Aurora tasks are specified as aconnected data flow graph where nodes are operators.

I Optimization is over this data flow graph.

I Aurora supports three query modes: continuous, which is classical forstreams; ad-hoc, which allows a query to be placed from now untilexplicitly terminated, and view, which allows for results to persist.

I Aurora accepts QoS specifications and attempts to optimize QoS forthe outputs produced.

I It performs real-time scheduling and load shedding.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 33 / 115

Example DSMSs

Data Stream Management Systems (2)Gigascope

I Gigascope [Cranor et al., 2003] specializes in network applications (aconsequence of its origins in AT&T).

I It has a declarative language, GSQL, that is a pure stream querylanguage (i.e., all inputs and outputs are streams).

I It uses ordering attributes to turn blocking operators intonon-blocking ones through a merge operator that is anorder-preserving union of two streams.

I Rather than interpret QEPs, it generates executables (and pushescomputation as low as possible, e.g., into network adapters).

I It provides for foreign functions to allow for escaping the pure streammodel (and perform more complex joins, e.g.).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 34 / 115

Example DSMSs

Data Stream Management Systems (3)STREAM

I STREAM [Arasu et al., 2004] is a general purpose data streammanagement system.

I It has a declarative language, CQL, that uses stream-to-relation, andrelation-to-stream converters in order to retain the classical semanticsof relational-algebraic operators.

I It aggressively shares state and computation among registered queriesand carefully considers resource allocation and use through itsscheduler.

I It performs continuous self-monitoring and re-optimization.

I It tries to approximate gracefully, if necessary.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 35 / 115

Example DSMSs

Data Stream Management Systems (4)TelegraphCQ

I TelegraphCQ [Chandrasekaran et al., 2003] supports continuousqueries over a mixture of relations and streams.

I It allows for both sliding and landmark windows to be defined (thelatter has a fixed older end and a newer end that moves forward asnew tuples arrive in the stream).

I The language used is SQL-like but window specification is much moreexpressive than in SQL OLAP.

I TelegraphCQ query execution is focussed on adaptivity and onmulti-query optimization opportunities.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 36 / 115

Example DSMSs

Data Stream Management Systems (5)Related Areas

I Publish-subscribe (pub-sub) systems process a very large number ofsimple conditions against a stream of events, while DSMS executemore complex queries.

I Sensor network applications are stream systems that, when deployedin isolation from power sources and communication sinks, have toconcern themselves with energy-efficient query plans to save batterypower, and with in-network processing and storage.

I Approximate query processors compute on-line aggregates in limitedspace and work by summarizing a stream (e.g., maintaining a sample)and running queries over the summary.

I On-line data stream mining is used for incremental clustering andclassification, as well as subsequence matching, among others.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 37 / 115

Example DSMSs

SummaryData Stream Query Processing

I Stream query processing is motivated by emerging data-intensiveapplications that monitor an environment as it evolves.

I Many novel problems arise in all of data modelling, query syntax andsemantics, query optimization and processing.

I Central to the challenges is the unbounded nature of streams and thedata-driven, rather than query-driven, nature of query execution.

I In DBMSs, data persists while queries are transient. In contrast, instream query processing, data is transient and queries persist.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 38 / 115

Example DSMSs

Advanced Database Management SystemsSensor Network Data Management

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 39 / 115

Outline

Sensor Network Data Management

Sensor Networks as a Distributed Computing Platform

SNDM Desiderata

Sensor Networks as a Hardware/Software Platform

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 40 / 115

SNDM

How Did We Get Here?

I Sensor network data management (SNDM) is yet anotherconsequence of the ascendancy of distributed computing as thedominant computing paradigm.

I In the database area, SNDM builds not only on previous work ondistributed and parallel DBMS but also on P2P query processing(QP) and on stream QP.

I Like P2PQP engines, sensor network QP (SNQP) engines implementan overlay network, i.e., a logical address and routing space overlower-level ones (say, TCP/IP).

I Like stream QP engines, SNQP engines process data streams.

I In comparison with P2P and stream data management, there arefewer fundamental challenges in SNDM.

I Novel challenges abound but they stem from the extremelyconstrained nature of the platform.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 41 / 115

SNDM

Sensor Networks (1)What Are They?

I A typical sensor network (SN) comprises 101 to 102 sensor nodes,often referred to as motes.

I A mote isI (typically) smallI battery-poweredI endowed with limited computing capabilitiesI capable of sensing the physical environmentI capable of forming links with other nodes by means of wireless radio

communication.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 42 / 115

SNDM

Sensor Networks (2)What Are They For?

I The major application areas so far have been:environmental data collection e.g.:

I of natural phenomena such as floods, fires, volcaniceruptions, etc.

I of natural habitats such as bird colonies, forests,glaciers, etc.

I of civil structures such as bridges, buildings, etc..

entity tracking e.g.:I of animals in natural environments,I of vehicles in built environments,I of goods in organizations, etc..

event detection e.g.:I of risk hazards such as rising pressure in utility pipes,

rising water levels in river basins, etc.I of intruders, patients in risk, etc..

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 43 / 115

SNDM

Sensor Networks (3)What Do They Do?

I Each sensor in a SN takes time-stamped measurements of physicalphenomena, e.g., temperature, light, sound, air pressure. etc..

I Sensed data is annotated at source, e.g., with the id, location, andtype of the sensor node that obtained it.

I Sensor nodes go beyond producing data: they are responsible forcomputing, storage and communication.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 44 / 115

SNs as a DC Platform

Sensor Network Data Management (1)Basics

I Each sensor node can be seen as a processing and storage element ina distributed, shared-nothing architecture with a wireless interconnect.

I From a database viewpoint, a SNQP engine allows a SN to be viewedas a distributed database that obtains data by sensing the physicalenvironment and over which we can run declarative continuousqueries.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 45 / 115

SNs as a DC Platform

Sensor Network Data Management (2)Contrasts with Existing DBMS Technology (1)

I The network replaces the storage and the buffer manager: datatransfers are from data in node memory as opposed to data blocks ondisks.

I Node memory is limited by cost and energy considerations, unlike diskstorage, which is relatively inexpensive.

I As with P2P approaches, the system is highly volatile (nodes may bedepleted, links may go down): the system should provide the illusionof a stable environment.

I Unlike stream QP, SNQP engines are said to be acquisitional, insofaras the rate in which data enters the system is typically specified as aquality-of-service (QoS).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 46 / 115

SNs as a DC Platform

Sensor Network Data Management (3)Contrasts with Existing DBMS Technology (2)

I Nodes typically only have depletable energy stocks, which are oftenhard to replenish.

I Classical qualities of service, e.g., response time, are comparativelyless important.

I SNQP must optimize for low energy consumption in order tomaximize longevity.

I Since the energy cost of communication may be up to an order ofmagnitude larger than that of processing, doing as much in-networkprocessing as possible tends to be advantageous.

I Query processing tends to become highly aware of, and very closelycoupled to, the networking layer.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 47 / 115

SNs as a DC Platform

Sensor Network Data Management (4)Contrasts with Existing DBMS Technology (3)

I Limited storage on nodes along with high communication costsprevents offloading, so persistent data must be subject tocompression, summarization and deletion policies, typically based onaging if queries about the past are to be supported.

I Since data is discarded, answers may be approximate.

I Since sensed data consists of measurements from the physical world,errors (e.g., noise) are inevitable, so support for range (instead ofexact) and probabilistic answers is important.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 48 / 115

SNDM Desiderata

Sensor Network Data Management (5)Desirable Characteristics

persistence : stored data must remain available to queries, despite sensor nodefailures and changes in the network topology

consistency : a query must be routed correctly to a node where the data isstored

controlled access to data : different update operations must not undo oneanother’s work, queries must always see a valid state of the DB

scalability : as the number of nodes increases, the total storage capacityshould increase, and the communication cost should not growunduly

balance : storage should not unduly burden any node, nor should a nodebecome a hotspot of communication

topological generality : should work well on broad range of network topologies

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 49 / 115

SNDM Desiderata

Sensor Network Data Management (6)Example Performance Metrics (1): Network

total network traffic : the sum total of bytes sent, which is an indicatorof probable longevity

per-node network traffic : this indicates whether there are hotspots,which when they fail may cause the network to becomedisconnected before it is depleted of energy

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 50 / 115

SNDM Desiderata

Sensor Network Data Management (7)Example Performance Metrics (2): Storage

available space : some SNQP engines hog persistent memory, leavingless room for measurements to be held

data longevity : the average amount of time a data item is accessible instorage (with variants for its having been summarized,approximated, etc.)

data access time : as with P2P networks, distributed data structures(e.g., geographic hash tables) may or may not deliverperformance

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 51 / 115

SNDM Desiderata

Sensor Network Data Management (8)Example Performance Metrics (3): Processing

delivery time : the amount of time taken for the effect of a measurementto be felt in the answer

acquisition rate : the frequency with which measurements are obtained

output rate : the frequency with which answers are produced

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 52 / 115

SN as a H/S Platform

Sensor Network Platforms (1)A Typical Mote: MICA (by Crossbow, now MEMSIC)

I 2.25 x 1.25 by 0.25 inches (5.7 x 3.18 x.64centimeters), two AA batteries

I 8-bit 4 MHz Atmel ATmega 128L (as muchas the original 1982 IBM PC)

I But it only consumes 8 milliamps whenrunning, and 15 microamps when sleeping.

I 512 KB of flash memory

I 10-bit A/D converter for temperature,acceleration, light, sound and magneticsensors

I 40 Kbps, 102m-range radio, 10 milliampsreceiving, 25 milliamps transmitting

I Like most sensor nodes, MICA motes runnesC/TinyOS executables.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 53 / 115

SN as a H/S Platform

Sensor Network Platforms (1)nesC/TinyOS: de facto Standard HW/SW Abstraction Layer for SNs

I TinyOS [Hill et al., 2000]is a component-based,event-driven runtimeenvironment designed forwireless SNs.

I nesC [Gay et al., 2003] isa C-based language forwriting programs over alibrary of TinyOScomponents.

I The figure shows howupper software layerswritten in nesC generatemote-level executablesthat rely on several kindsof TinyOS components.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 54 / 115

SN as a H/S Platform

Sensor Network Platforms (1)Upper Software Layers

I There is great diversity in the uppersoftware layers.

I The figure shows a conceptuallyplausible division of labour betweensoftware layers in the case of SNQPengines:

I The topmost layer implementsquery execution functionality.

I It relies on a routing layer thatimplements the overlaynetwork required to carry thedata flows that make up aquery.

I The routing layer relies on themedium-access control (MAC)layer that implements theradio-level protocols required.

I I Scheduling tasks cutsvertically across softwarelayers.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 55 / 115

SN as a H/S Platform

SummarySensor Network Data Management

I SNDM has emerged as another form of distributed computing: itshare with P2P the idea of overlay networks and with data streams thegoal of processing events, in this case, grounded on physical reality.

I A SN is a distributed computing platforms, albeit an extremelyresource-constrained one.

I From such constraints there emerge desiderata and performancemetrics that make SNDM platforms distinct from any other DBMStechnology.

I Fully-functional hardware and software platforms are available fromsensor nodes to system-level programming platforms.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 56 / 115

SN as a H/S Platform

Advanced Database Management SystemsSensor Network Querying

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 57 / 115

Outline

Sensor Network Queries

Sensor Network Querying with TinyDB

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 58 / 115

SN Queries

Sensor Network Queries (1)

I When a SN is construed as a data management platform, SQL-likedeclarative queries can be used to retrieve information from it.

I This is a very significant advance on the alternative of programmingdata retrieval tasks directly, because the very low level at which thehardware/software infrastructure is cast makes the softwareengineering task extremely difficult and costly.

I It is orders of magnitude more convenient and cost-effective to pose adeclarative query and have the SNQP map that to aninterpretable/executable program that can retrieve the desired data.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 59 / 115

SN Queries

Sensor Network Queries (2)

I Consider SNs that act as flood warning systems.

I Consider the needs of an emergency management agency to monitorthe consequences of heavy rainfall in a region (e.g., Hull).

I An example SN query in this context might be:

Every 10 minutes for the next 3 hours, report the maximumrainfall level in stations in Hull, provided that it is greaterthan 3.0 inches.

select max(rainfall_level), station

from Sensors

where area = ’Hull’

group by station

having max(rainfall_level) > 3.0

duration [now, now + 180 min]

sampling period 10 min

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 60 / 115

SN Queries

Sensor Network Queries (3)

I In the example just used (but not in general), the query was expressedover one table comprising all sensors in the SN, with each sensorcorresponding to a column in the table.

I This example assumed (as is usual) that there is metadata describingschemas and the execution environment available at the point ofcompilation (often referred to as the base station).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 61 / 115

SN Queries

Sensor Network Queries (4)

I Monitoring queries are long-running, continuously-evaluated queries.

I In the example, the duration clause stipulates the period duringwhich data is to be collected.

I The sampling period, also known as acquision interval, clausestipulates the frequency at which the sensors acquire data (and, bydefault, results are delivered).

I The desired outcome is a stream of notifications of system activity(periodic or triggered by special situations)

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 62 / 115

SN Queries

Sensor Network Queries (5)

I Some SN queries need to aggregate sensed data over time windows,e.g.,

Every ten minutes, return the average temperaturemeasured over the last ten minutes.

I Other need to correlate data produced simultaneously by differentsensor nodes, e.g.,

Report an alert whenever 2 sensor nodes within 10 meters ofeach other simultaneously detect an abnormally hightemperature.

I Many queries contain predicates on the sensor nodes involved (e.g., itis common to refer to geographical locations), as is to be expectedsince SNs are grounded in the physical world.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 63 / 115

SN Queries

Sensor Network Queries (6)

With respect to the time dimension, the major types of SN queries are:

I long-running, continuous queries: report results over an slidingtime window, e.g.

For the next 3 hours, every 10 minutes, retrieve the rainfalllevel in Hull stations.

I snapshot queries: retrieve sensed data the network at a given pointin time (typically now), e.g.,

Retrieve the current rainfall level in Hull stations.

I historical queries: retrieve past sensed data (and may require nodesto store data persistently), e.g.,

Retrieve the average rainfall level at all sensor nodes for thelast 3 months of the previous year.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 64 / 115

SN Querying with TinyDB

SN Querying with TinyDB (1)The TinyDB SNDM System

I TinyDB is the seminal SNDM: it single-handedly delineated theresearch topic of SNQP.

I TinyDB is a nesC-coded distributed query processor that runs onMICA motes over TinyOS.

I It has had several successful deployments, mostly for environmentaldata collection, the largest consisting of around 80 nodes.

I It is now not actively developed any more but still constitutes thebenchmark for more recent SNQP systems.

I Cougar was another influential SNDM platform but was never as fullydeveloped as TinyDB and its influence has correspondingly diminishedrecently.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 65 / 115

SN Querying with TinyDB

SN Querying with TinyDB (2)TinyDB Query Cycle

I TinyDB assumes the existence of a base station (that is assumed tobe a normal computer, say a PC).

I The base station parses and optimizes a query.

I The resulting QEP is injected into the SN.

I This starts a dissemination process as a result of which a routing treeis formed, with the QEP being installed in the sites that comprise itand then started.

I Results flow back up the routing tree.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 66 / 115

SN Querying with TinyDB

SN Querying with TinyDB (3)Query Language Features

I The TinyDB query language (called TinyQL) is a declarative SQL-likequery language supporting selection, (limited kinds of) join,projection, and aggregation.

I It is a continuous QL, so it supports windows.

I It is an acquisitional QL, so it supports sampling rates.

I TinyQL views the entire collection of sensors as a single, unboundeduniversal relation, with attributes for all the sensing modalities (e.g.,temperature, pressure, etc.) for which there is a sensor.

I Each modality is modelled as a distinct attribute in the universalrelation.

I Tuples are tagged with metadata, i.e., node id, location, etc.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 67 / 115

SN Querying with TinyDB

SN Querying with TinyDB (4)Example TinyDB Queries: Select/Project

Every second, for 10 seconds, return node id, light andtemperature readings provided the temperature is above 10.

select nodeid, light, temp

from Sensors

where temp > 10

sample interval 1s

for 10s

I This query generates a stream at the base station, where it may belogged or output to the user.

I The stream is a sequence of tuples, each tuple including a timestamp.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 68 / 115

SN Querying with TinyDB

SN Querying with TinyDB (5)Example TinyDB Queries: Materialized Views

I In TinyQL, because of the design choice for a universal relationSensors, windows cannot be specified as in stream QLs.

I Instead, windows are specified as materialized views over streams.

I The following materializes the last eight light readings taken 10sapart:

create storage point recentlight size 8

as ( select nodeid, light

from Sensors

sample interval 10s)

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 69 / 115

SN Querying with TinyDB

SN Querying with TinyDB (6)Example TinyDB Queries: Joins

I In TinyDB, joins are only allowed between two storage points on thesame node, or between a storage point and the Sensors relation.

I For example, the following is an example of what the TinyDB papersrefer to as a landmark query.

Every 10s, from now on, return the number of recent lightreadings that were brighter than the current reading.

select count(*)

from Sensors s,

recentLight r

where r.nodeId = s.nodeId

and r.light > s.light

sample interval 10s

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 70 / 115

SN Querying with TinyDB

SN Querying with TinyDB (7)Example TinyDB Queries: Aggregation

I TinyDB supports aggregations over time intervals using slidingwindows.

I For example:

Every 5 seconds, sampling once per second, return theaverage volume over the last 30 seconds.

select winavg(volume, 30s, 5s)

from Sensors

sample interval 1s

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 71 / 115

SN Querying with TinyDB

SN Querying with TinyDB (8)Example TinyDB Queries: Event-Based (1)

I TinyQL allows data collection to be initiated by event occurrences.

I Events are generated explicitly, either by another query or by theoperating system.

I Event occurrences have attributes that bind parameters of anevent-based query.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 72 / 115

SN Querying with TinyDB

SN Querying with TinyDB (9)Example TinyDB Queries: Event-Based (1)

I The following (rather naıve) query raises a bird-detect event by detectinga high temperature in a nest:

select nodeid,loc

where temp > 5

output action signal bird-detect(loc)

sample period 10s

I Then, the following query responds to bird-detect events raised:

When a bird has been detected in a nest, report the average lightand temperature at sensors near the nest.

on event bird-detect(loc):

select avg(light), avg(temp), event.loc

from Sensors s

where dist(s.loc, event.loc) < 10m

sample interval 2s for 30sAAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 73 / 115

SN Querying with TinyDB

SN Querying with TinyDB (10)Example TinyDB Queries: Lifetime-Based

I Instead of an explicit sample interval clause, users may request aspecific query lifetime, i.e., a duration in days, weeks, or months

For at least 30 days, report light and acceleration by sampling atas fast a rate as possible.

select nodeId, light, accel

from Sensors

lifetime 30 days

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 74 / 115

SN Querying with TinyDB

SummarySensor Network Querying

I Using declarative queries to retrieve data from a SN has significantpractical and economical benefits.

I TinyDB exemplifies the functionality that is capable of beingsupported.

I Complex queries and event detection are expressible, accompanied byquality-of-service expectations regarding lifetime and samplingintervals.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 75 / 115

SN Querying with TinyDB

Advanced Database Management SystemsSensor Network Query Processing

Alvaro A A Fernandes

School of Computer Science, University of Manchester

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 76 / 115

Outline

Query Processing in TinyDB

Query Processing in SNEE

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 77 / 115

TinyDB QP

Query Processing in TinyDB (1)SNQP Engines as Autonomous Systems

I The main goal in SNQP (on battery-powered motes) is to reduceenergy consumption.

I Deploying new sensor nodes in the field, or physically replacing orrecharging batteries is time consuming and expensive, sincedeployment sites of interest tend to be remote, isolated andsometimes hazardous.

I This means that query optimization aims to generate QEPs that allowthe SN to perform autonomously, i.e., the QEP controls where, when,and how often data is physically acquired (i.e. sampled), processedand delivered.

I TinyDB is an example of this class of SNQP engine.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 78 / 115

TinyDB QP

Query Processing in TinyDB (2)Duty Cycling as a Means to Save Energy

I Most motes can transition their hardware components between states withdifferent energy consumption rates.

I Typical states are:

Snoozing : the processor and radio are idle, waiting for either a timer-or an an external event to wake the device.

Processing : the processor is doing local processing.Transmitting : the radio is delivering results (either locally-obtained or

relayed) to a neighbour.Receiving : the radio is receiving results from a neighbour.

I Duty cycling is vital for longevity, and, therefore, the ability to spend time inlower-energy states is an important performance metric for SNQP engines.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 79 / 115

TinyDB QP

Query Processing in TinyDB (3)Networking: Short Ranges, Multi-Hops, Relay

I The current range for low-power wireless radios is no greater, inpractice, than 30-50m, even in the absence of obstacles.

I Such short ranges imply the need for multi-hop communication whereintermediate nodes act as relays (either purely or in combination withtheir sensing and processing duties).A§

I Relays help bridge longer distances with less expenditure of energyand also allow routes to bypass obstacles.

I It is desirable that SNs be low maintenance and easy to deploy from anetwork management viewpoint.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 80 / 115

TinyDB QP

Query Processing in TinyDB (4)Networking: Time Synchronization (1)

I Clock drift is the name given to the process by which clocks thatstarted with the same time reading gradually and increasingly divergeon their readings, leading to a lack of synchrony.

I Clock drift is likely in the limited hardware used for more-level SNs,leading to synchronization issues as to whether the target is in areceiving state when the source is in a transmitting one.

I There are different time synchronization protocols.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 81 / 115

TinyDB QP

Query Processing in TinyDB (5)Networking: Time Synchronization (2)

I The protocol used by TinyDB is simple and (seems to be) effective inpractice:

I All messages are sent with a 5-byte timestamp indicating node time inmillisecs. system time := node time

I When a node receives a message it sets its node time to thesystem time := timestamp received .

I All nodes agree that the waking period begins whensystem time mod epoch = 0, where epoch is the period between thestart of each sampling activity and the end of the processing cycle.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 82 / 115

TinyDB QP

Query Processing in TinyDB (6)Networking: Network Formation (1)

I TinyDB does not assume the communication topology to be known.

I Instead, it instruments nodes to form it in an ad-hoc manner.I It uses a flooding algorithm as follows:

1. The root broadcasts a request.2. All nodes that hear this request process it, and forward it on to their

children, and so on, until the entire network has heard the request.3. This establishes a network topology (with undirected edges).4. A communication topology (i.e., one with directed edges) can then be

chosen: nodes pick a parent node (with the most reliable connection tothe root, i.e. highest link quality).

5. This parent is then responsible for forwarding the node’s (and itschildren’s) messages to the base station.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 83 / 115

TinyDB QP

Query Processing in TinyDB (7)Networking: Network Formation (2)

I In the example network topology, vertices denotesnodes named by the corresponding label, with Bdenoting the base station.

I Edges denote that the nodes involved have acommunication link between them (i.e., are withincommunication range of one another).

I Edge labels denote the quality (the higher, thebetter) of the communication link.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 84 / 115

TinyDB QP

Query Processing in TinyDB (8)Networking: Network Formation (3)

I Given a network topology such as described, the selection of acommunication topology is as follows:

1. Given nodes N and N ′, if N transmits and N ′ hears with quality Q,then a candidate routing edge N ′ →Q N is proposed iff there is noalready existing proposal of a candidate routing edge N →Q N ′.

2. If there is more than one candidate routing edge outgoing from thesame node, i.e., if there are edges N ′ →Q1 N1, . . . ,N

′ →Qn Nn then theone with the highest Qi is chosen (or one is chose arbitrarily if there isa tie).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 85 / 115

TinyDB QP

Query Processing in TinyDB (9)Networking: Network Formation (4)

I Given the previous network topology, the followingare the derivation steps (with underlined candidateedges being discarded ones) which compute thecommunication topology:

1. a →5 B

2. c →3 B

3. B →3 c

4. a →8 c

5. B →5 a

6. c →8 a

7. d →4 a

8. a →4 d

9. e →5 d

10. d →5 e

I There is one case of more than one candidaterouting edge outgoing from the same node, viz.,{a →5 B, a →8 c}, in which case, we choosea →8 c because it has the highest quality.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 86 / 115

TinyDB QP

Query Processing in TinyDB (10)Query-Specific Routing

I Over a given communication topology, we can select the paths alongit that data flows will follow in a QEP.

I When TinyDB disseminates the QEP (i.e., sends it to be installed atnodes) it computes what it calls a semantic routing tree (SRT), bywhich is meant that it takes into account the predicates used in thequery to determine which nodes need to participate in thecomputation.

I This means that TinyDB also establishes the route for data flows(from leaves to root) as it decides (from root to leaves) which nodeswill have the QEP installed and executing.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 87 / 115

TinyDB QP

Query Processing in TinyDB (11)Query-Specific Routing

I An SRT is especially useful for a query in which the predicates in theWHERE clause define a geographical extent through an attribute A(e.g., the x-coordinate of a node).

I In a TinyDB SRT, each node stores a single unidimensional intervaldenoting the range of A values corresponding to its descendants.

I Then, the decision as to whether a node n must be involved inprocessing a QEP q with a predicate over A is taken as follows:

1. When a QEP q with a predicate over A arrives in node n, if q applieslocally to n, n participates in the execution of q, therefore n startsexecuting q.

2. If the A-value of any n-child n′ overlaps with the A-value in q (whichcondition n can verify from the A-range it holds), then n prepares toreceive results from any such n′ and forwards q to them.

3. If there is no overlap, then q is not forwarded from n.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 88 / 115

TinyDB QP

Query Processing in TinyDB (12)An Example TinyDB SRT

I Let the query be

SELECT light

FROM Sensors

WHERE x > 3 AND x < 7

I If so, N1 knows it can exclude N2,and N3 knows it can exclude N5.

I In this way, only the solid-linenodes in the figure, i.e., those inthe desired x-range, receive andexecute the QEP

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 89 / 115

TinyDB QP

Query Processing in TinyDB (13)Event Influence in TinyDB QP

I Events allow the nodes to snooze until some external conditionoccurs, instead of continually polling or blocking on an iteratorwaiting for some data to arrive.

I The benefit is significant reduction in energy consumption.

I When a query is issued, it is assigned an id that can be used to stop aquery via a stop query id command,

I Queries can be limited to run for a specified lifetime via a FOR clause,or include a stopping condition that is an event occurrence.

I TinyDB can perform lifetime estimation if it is not stipulated: it usesa cost model to relate sampling and transmission rate to energyconsumption.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 90 / 115

TinyDB QP

Query Processing in TinyDB (14)Energy-Aware Optimization in TinyDB (1)

I Consider the following metadata about sensor hardware:

Sensor Power Sampling time Sampling energy

Light, Temp 0.9 0.1 90Magnetometer 15 0.1 1500Accelerometer 1.8 0.1 180

I The table shows that sampling is energy-expensive and that the costvaries between different modalities, e.g., the magnetometer consumesan order of magnitude more energy than other sensors.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 91 / 115

TinyDB QP

Query Processing in TinyDB (15)Energy-Aware Optimization in TinyDB (2)

I Note that a sample from a sensor s must be taken before one canevaluate any predicate over the attribute Sensors.s.

I If a predicate discards a tuple of the Sensors.s table, thensubsequent predicates need not examine the tuple, and the expense ofsampling any attributes in those predicates can be avoided.

I Thus, ordering the predicates in such a way that those that consumeless energy are sampled first is often a good strategy.

I Now, consider the following example query Q:

select accel, mag

from Sensors

where accel > 5

and mag > 10

sample interval 1s

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 92 / 115

TinyDB QP

Query Processing in TinyDB (16)Energy-Aware Optimization in TinyDB (3)

I There are three possible strategies to evaluate Q:

1. the magnetometer and the accelerometer are sampled before eitherpredicate is evaluated;

2. the magnetometer is sampled, the predicate on it is evaluated then theaccelerometer is sampled and the predicate on it is evaluated;

3. the same as the previous but with the sampling order reversed.

I The first is always at least as energy-expensive as the latter two.

I The third is likely to be better than the second given that samplingaccelerometer is cheaper (unless mag > 10 is much more selectivethan accel > 5).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 93 / 115

TinyDB QP

Query Processing in TinyDB (17)Processing TinyDB QEPs

I Once a query has been optimized and disseminated, the queryprocessor executes it.

I Roughly (i.e., ignoring communication), the node:

1. sleeps then2. wakes up then3. samples the sensor then4. processes both the data just obtained and that received from children

then5. puts the result in a queue for delivery to its parent.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 94 / 115

TinyDB QP

Query Processing in TinyDB (18)Load Shedding in TinyDB

I When there is no contention, the queue can be drained faster thanresults arrive in it.

I When the opposite is the case, prioritizing data delivery is necessary.I TinyDB uses three 3 simple prioritization schemes:

1. naıve: a tuple is dropped if the queue cannot accept it.2. winavg: the first two results are averaged into one to make room at

the tail.3. delta: each tuple is marked with to indicate how different it is from the

last transmitted result, so that when there is a need to make room inthe queue, the least different tuple is dropped.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 95 / 115

TinyDB QP

Tree-Staged Aggregation (1)Making the Most of En-Route Computational Possibilities

I The fact that the overlay network is multi-hop means that even asintermediate nodes are doing the routing towards the destination, theycan do some computation and thereby help reduce the bandwidth.

I Note, firstly, that, conceptually, the many data items will all traveltowards the base station, i.e., the route of the communication tree.

I Along the route, they must converge on certain nodes, with the resultthat the overall form of the paths traversed will be that of a tree.

I At least at every confluence point (if not at each node), one can takethe opportunity to do a partial aggregation and send that resultforward.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 96 / 115

TinyDB QP

Tree-Staged Aggregation (2)An Example

I In the figure, thevalues obtained in,or flowing through,each node areenclosed in squarebrackets.

I Arrows denotepaths in therouting.

I Arrow labels showintermediateresults for SUM

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 97 / 115

TinyDB QP

Query Processing in TinyDB (3)Tree-Staged Aggregation in TinyDB

I TinyDB makes use of tree-stagedaggregation.

I The reduction in bandwidth isimportant because of the energycost of radio communication.

I For example, consider the 3-hoprouting tree in the figure and aCOUNT query.

I If all data is sent to the basestation, 16 messages and 32 bytesare transmitted.

I If sites perform partial aggregationon the way, 6 messages and 6 bytesare transmitted.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 98 / 115

SNEE QP

Query Processing in SNEE (1)SNQP as Distributed QP

I TinyDB sends the same QEP to all participating nodes and expects aquery engine to be running in every node.

I It can be seen as not being economical with memory.

I TinyDB also sends every tuple it produces as soon as it is produced.

I It can be seen as not being careful to pack bytes when transmittingand receiving and therefore may find it harder to amortize the fixedper-message cost.

I It could be argued that construing a SN as distributed computingplatform in a strict sense can overcome this and other shortcomings.

I SNEE is a SNQP developed in Manchester that takes this approach.

I For more detail on the remainder of these notes, see the assignedreading [Galpin et al., 2009].

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 99 / 115

SNEE QP

Query Processing in SNEE (2)SNEE v. TinyDB

I SNEE comes short of TinyDB innot supporting

I specification of event-basedqueries

I materialization of results

I SNEE matches TinyDB in

I allowing the user to stipulatehow often data is acquiredand processed

I performing in-network,tree-staged aggregation

I allowing selection andprojection

I correlating data across time

I SNEE goes beyond TinyDB in

I supporting application-specificrelations

I allowing windows on the pastI supporting joins without

materializationI allowing the specification of

which sites and which senseddata to include in a query

I correlating data across sitesand times

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 100 / 115

SNEE QP

Query Processing in SNEE (3)SNQP as Distributed QP (1)

I SNEE uses the well-knowntwo-phase optimization approach ofparallel/distributed DBMSs.

I In the figure, the three first stagesare classical (except that SNEEdoes not perform much logicalrewriting).

I SNEE introduces the notion ofrouting which is required in thecase of SNs (because of wireless,ad-hoc networking) but not inclassical distributed DBMSs(because there is no need to specifyan overlay network).

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 101 / 115

SNEE QP

Query Processing in SNEE (4)SNQP as Distributed QP

I The partitioning/scheduling phaseis broken down in SNEE into whatis referred to as where scheduling(deciding where QEP fragmentsrun) and when scheduling(deciding when the QEP fragmentsrun).

I Finally, unlike TinyDB, SNEEgenerates not interpretable QEPs,but nesC/TinyOS source code,which when compiled and deployedin sensor nodes constitutes anexecuting QEP.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 102 / 115

SNEE QP

Query Processing in SNEE (5)SNQP as Distributed QP

I The first example queryshows that SNEEql (theSNEE query language),inspired by STREAM’s CQL,allows the definition of logicalextents (e.g., Inflow,Outflow) in the FROM clause.

I The second shows thatSNEEql allows windows onstreams and supportsaggregation.

I The third shows that SNEEqlcan specify windows on thepast (e.g., data seen oneminute ago) and express joinswithout materializationpoints.

Q1 == SELECT *

FROM Outflow

WHERE pressure > 24

Q2 == SELECT AVG(pressure)

FROM Outflow

[FROM NOW-10 secs TO NOW-5 secs]

WHERE temperature < 10

Q3 ==

SELECT Outflow.time,

Inflow.pressure,

Outflow.pressure

FROM Outflow [NOW],

Inflow [AT NOW-1 Min]

WHERE Inflow.pressure > 500

AND Outflow.pressure > Inflow.pressure

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 103 / 115

SNEE QP

Query Processing in SNEE (6)Some QoS and Some Metadata Used by SNEE

I Users can specify some QoS expectations, e.g.:I acquisition rateI maximum delivery time

I The SNEE compiler/optimizer expects metadata such as:I the schema of each sensed streamI which sensor nodes sense which attributesI the network connectivity graph (NCG)

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 104 / 115

SNEE QP

Query Processing in SNEE (7)Some SNEE Optimization Techniques

I Recall that TinyDB does not presume to know the NCG and hencefirst derives it by flooding and then computes a routing tree that isdata-sensitive.

I SNEE assumes the NCG to have been asserted in the metadata andcomputes the routing tree as an approximation of a minimumspanning tree that aims to minimize the total energy cost of routing.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 105 / 115

SNEE QP

Query Processing in SNEE (8)Cost Models in SNEE

I SNEE uses cost models more extensively than TinyDB and hence isable to statically compute worst-case bounds on space and time for allQEP fragments.

I This allows it to derive a strict agenda for execution.

I Timers fire (to wake up, acquire/receive, process, transmit/deliverdata at specific time slots) and go on firing with a periodicity that isdetermined by a buffering factor.

I The use of buffering makes SNEE QEPs more energy-efficient thanTinyDB.

I The amount of buffering is computed from the QoS expectationsregarding acquisition rate, maximum delivery time and the memoryavailable/required.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 106 / 115

SNEE QP

Query Processing in SNEE (9)Routing Tree and Distributed Algebraic Form for Example SNEE Query Q3 (1)

I Assume that Q3 (above) is posed with the following QoS expectations:

Delivery time: Within 2.5 seconds

Acquisition rate: Every 2 seconds

Lifetime: 3 months

I Assume further that the metadata describing where data is to beacquired and delivered is as follows:

Outflow at sites: 1,2,4

Inflow at sites: 3, 4

Destination at site: 7

I The next figure shows the routing tree and the distributed algebraicform computed for Q3.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 107 / 115

SNEE QP

Query Processing in SNEE (10)Routing Tree and Distributed Algebraic Form for Example SNEE Query Q3 (2)

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 108 / 115

SNEE QP

Query Processing in SNEE (11)Execution Agenda for Example SNEE Query Q3

I The agenda computed for Q3 given theQoS expectations and the metadata aboveis given in the figure.

I Rows are identified by relative time,columns by sensor nodes/sites in therouting tree for the query.

I A cell indicates which action is performed:

I Fn indicates that the denotedfragment executes at that time inthat node.

I txn indicates that that node at thattime transmits data to node n, whilerxn indicates that that node at thattime receives data from node n.

I The acquisition rate is reflected inthe buffering factor (e.g., there istime to acquire twice and bufferbefore the first transmission).

I I The delivery time can be read asthe span of time it takes for datato be transmitted to the deliverynode.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 109 / 115

SNEE QP

SummarySensor Network Querying

I Query optimization in SNDM is quite distinct both in more clearlyhaving multiple objectives and in the prominence of energy cost (as aprerequisite for longevity).

I TinyDB, the seminal first-generation SNQP engine, pioneered manyideas and insights on query optimization and execution in SNs.

I SNEE, a second-generation SNQP engine, differs from TinyDB inmany respects, most fundamentally in viewing a SN as a distributedcomputing platform with all the implications this viewpoint suggests.

I The SNDM field is still being actively developed and the landscape islikely to change at a fast rate in the next few years.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 110 / 115

SNEE QP

Acknowledgements

The material presented mixes original material by the author as well asmaterial adapted from tutorials and presentations by M. Tamer Ozsu, NickKoudas, Divesh Srivastava, Jennifer Widom, Lina Al-Jadir, Qiong Luo,Hejun Wu, Wei Hong, and Samuel Madden.The author gratefully acknowledges the work of the authors cited whileassuming complete responsibility any for mistake introduced in theadaptation of the material.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 111 / 115

SNEE QP

References (1)

Abadi, D. J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C.,Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. B. (2003).Aurora: a new model and architecture for data stream management.VLDB J., 12(2):120–139.http://dx.doi.org/10.1007/s00778-003-0095-z.

Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M., Ito, K.,Motwani, R., Srivastava, U., and Widom, J. (2004).Stream: The stanford data stream management system.http://dbpubs.stanford.edu/pub/2004-20.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 112 / 115

SNEE QP

References (2)

Avnur, R. and Hellerstein, J. M. (2000).Eddies: Continuously adaptive query processing.In Chen, W., Naughton, J. F., and Bernstein, P. A., editors, SIGMODConference, pages 261–272. ACM.http://doi.acm.org/10.1145/342009.335420.

Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein,J. M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., andShah, M. A. (2003).Telegraphcq: Continuous dataflow processing for an uncertain world.In CIDR.http://www-db.cs.wisc.edu/cidr/cidr2003/program/p24.pdf.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 113 / 115

SNEE QP

References (3)

Cranor, C. D., Johnson, T., Spatscheck, O., and Shkapenyuk, V.(2003).Gigascope: A stream database for network applications.In Halevy, A. Y., Ives, Z. G., and Doan, A., editors, SIGMOD Conference,pages 647–651. ACM.http://doi.acm.org/10.1145/872757.872838,http://www.acm.org/sigmod/sigmod03/eproceedings/papers/ind03.pdf.

Gay, D., Levis, P., von B ehren, J. R., Welsh, M., Brewer, E. A., and Culler,D. E. (2003).The nesC language: A holistic approach to net worked embedded systems.

In Cytron, R. and Gupta, R., editors, PLDI, pages 1–11. ACM.

http://doi.acm.org/781131.781133.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 114 / 115

SNEE QP

References (4)

Hill, J. L., Szewczyk, R., Woo, A., Hollar, S., Culler, D. E., and Pister,K. S. J. (2000).System architecture directions for networked sensors.In Rudolph, L. and Gupta, A., editors, ASPLOS, pages 93–104. ACM Press.

http://doi.acm.org/10.1145/356989.356998.

AAAF (School of CS, Manchester) Advanced DBMSs 2012-2013 115 / 115


Top Related