Post on 08-May-2018
transcript
Data Stream Warehousing
Lukasz Golab lgolab@uwaterloo.ca University of Waterloo
Theodore Johnson johnsont@research.aC.com
AT&T Labs -‐ Research
Big Data
• Every 2 days we create as much informaKon as we did up to 2003 (Eric Schmidt)
• Becoming easier to produce/collect – Sensors, Web, cheap bandwidth
• Becoming easier/cheaper to store – Cheap hard disks, commodity hardware
TradiKonal Big Data Workflow • Wait for data to arrive • Prepare and load data
– Into HDFS, key-‐value store, … – or into a database, then index
• Compute result • Start over
But • Many interesKng data sets are “streaming”
– Monitoring (IP networks, infrastructure, smart transportaKon systems and power grids, RFID, system logs, manufacturing)
– TransacKons (stock Kckers, credit card purchases) – User behaviour logs (Web, social media)
Stream Data Workflow
• For each item or batch of items – Do some processing – Compute/update results
• Now feasible due to cheap RAM, mulK-‐cores, etc.
“Fast Data” Systems • Data Stream Management Systems (DSMS)
– Borealis, StreamBase, Gigascope – Simple queries over fast append-‐only data – Results streamed out, usually not stored
• Key-‐value stores have fast transacKonal response, but analyKcs are difficult – Put/get interface makes correlaKon difficult – AnalyKcs are inefficient on distributed stores
In This Tutorial • Big Data Management
– Focus on scalability and deep analyKcs, but high latency
• Fast Data Management – Low latency, but limited capability and no persistent storage
• Can we do both? – Data Stream Warehousing
Five “V”s of Big Data • Volume • Velocity • Variety
– Data integraKon • VerificaKon
– Data cleaning • Value
– Data mining
Outline • Why? • What? • Detailed example • How?
– Common elements – System architectures – Performance opKmizaKons – Data stream quality
• Open Problems
Why? • Could have 2 separate systems, but
– Not clear where to divide the systems – Overhead of moving data from one system to the other
– Harder to develop applicaKons • Different SQL dialects, etc.
– Historical data provides context for real-‐Kme data – Even tradiKonal analyKcs/reporKng is becoming more real-‐Kme
• Reduce Kme from ingest to insight
What? • Load data from a mulKtude of streaming sources – Wide variaKon in data latencies
• Provide transparent access to both real-‐Kme and historical data
• Gracefully handle late-‐arriving data • Schedule queries and updates to materialized views in spite of highly variable workloads – Load shedding by dropping data is not an opKon
MulKtude of streaming sources • Data becomes most useful when you can correlate results from many sources – Hundreds to thousands of disKnct data feeds
• Network monitoring – Correlate twiCer feeds, acKve monitoring streams, and link uKlizaKons to idenKfy trouble spots
• Smart Grid – Correlate smart meter readings, line temperature measurements, and phasor measurement units to proacKvely react to overloads and avoid blackouts
0
2
4
6
8
10
12
0 100000 200000 300000 400000 500000 600000
Num
ber o
f Windo
ws
Time ( seconds)
Late-‐arriving data • Late arriving data is a
common problem for streaming systems.
• DSMS : data arrives minutes late
• Stream Warehouse : data can arrive days late
• Load all data and propagate their results in spite of lateness.
• AlerKng, troubleshooKng, real-‐Kme data mining all depend on access to real-‐Kme and historical data
• Hard to draw a boundary between new and old
Transparent Access
Scheduling • Ensure that the most Kme criKcal applicaKons/views get priority service.
• Ensure that no applicaKon is starved • In spite of temporary overload
Network Monitoring • Darkstar project at AT&T Labs • MoKvaKng applicaKon for the Data Depot stream warehouse system
• Data collected: – Passive and acKve probe measurements, route monitoring, system logs, configuraKon data, customer service Kckets and notes
• For: – Networking research, data mining, alerKng, troubleshooKng
Darkstar: Mining Vast Amounts of Data
Network
Route monitors (OSPFmon, BGPmon)
Device service monitoring (CIQ, MTANet, STREAM)
AcKve service and connecKvity monitoring
Syslog Config
SNMP Polling (router, link) Nenlow
Deep Packet InspecKon (DPI)
Alarms
Tickets
AuthenKcaKon/ logging (tacacs)
Customer feedback – IVR, Kckets, MTS
IP Backhaul Enterprise IP, VPNs
Ethernet Access
IPTV
Layer one
Mobility
ARGUS: DetecKng Service Issues… • Goal: detect and isolate ac#onable anomaly events using comprehensive end-‐to-‐end performance measurements (e.g. GS tool) • SophisKcated anomaly detecKon and heurisKcs • SpaKal localizaKon • Accurately accounts for service performance that varies considerably by Kme-‐of-‐day
and locaKon • Impact: • Reduced detecKon Kme from days to approx. 15 mins for detecKng data service issues
• OperaKonal naKon-‐wide monitoring data service performance for 3G and LTE (TCP retransmission, RTT, throughput from GS Tool)
Market
Sub-‐Market Sub-‐Market …
SGSN SGSN
… RNC RNC
…
SITE SITE …
SITE
SITE
RNC
SITE
SITE
RNC
SITE
SITE RNC
SGSN
SGSN GGSN
GGSN
Collect end-‐to-‐
end Performance
Data
Approach: Mobility LocalizaKon Hierarchy
Case Example: Silent CE Overload CondiKon • ARGUS detected event: 2 Columbia 3G Ericsson SGSN’s impacKng RNC’s in West Virginia, Norfolk, and Richmond • No other indicaKon of issue • Topology highlighted CE used by only impacted SGSNs
• RCA: “6148 48 port 1gig card is limited to a shared 1 gig bus for each set
of 8 gig ports”
ARGUS alarm: clmamdorpn2 (TCP retransmissions) CE UGlizaGon flaJening
ARGUS As a General Capability… Spike in call drop rate on MSC hrndvacxca1 RTT anomalies (SGSN level)
Outage start 5:30 GMT
First Anomaly 5:40 GMT
CTS Ticket Created 08:21 GMT
Social media (TwiCer) NY outage
LA outage
Node metrics, acKve measurements (CBB, IPAG WIPM delay)…
Mobility customer Kckets (Boston market – PE isolaKon)
• 1. At-‐a-‐glance view of network topology and state
• VisualizaKon to summarize important informaKon on network health • Color-‐coded
• Complimentary to KckeKng system – reporKng issues below “alarming” status
Page 21
hCp://ptolemy.research.aC.com/
Use network visualiza9on and convenient data explora9on to help network operators with network health monitoring and service problem troubleshoo9ng
Ptolemy
hCp://ptolemy.research.aC.com/mobility
Assess damage, idenKfy remaining capacity
Page 22
Loss of many links out of Japan. What’s ler?
Example 1: Japan Earthquake, March 11th 2011
IdenKfy traffic shirs, no congesKon
Page 23
Increase in link load as traffic re-‐routed
Link load
Example 1: Japan Earthquake, March 11th 2011
Recap • Load data from mulKple diverse sources • Transparent access to real-‐Kme and historical data
• Schedule queries/updates – And materialized views
• Handle late/out-‐of-‐order data • Could have two separate systems, but …
Architectures • DSMS-‐based • DBMS-‐based • Hadoop-‐based
DSMS-‐based • Add ability to store data (e.g., Aurora/Borealis)
Output stream
“StaKc” data set
ConnecKon point
DSMS-‐based
• Example2: Moirae: History-‐enhanced monitoring
Postgres
Borealis
SQL query Data export
DSMS-‐based • Example 3: Dejavu: paCern matching over live and historical streams – Actually DBMS-‐based (MySQL)
MySQL
PaCern matching engine
PaCern match query Data export
DSMS-‐based • Pros
– Enables real-‐Kme processing with context
• Cons – Does not enable complex analyKcs
• Must keep up with live data
– Stores limited history
DBMS-‐based • Use the query processing and storage engine of a DBMS
• Add layers for addiKonal services – Fast data load – Temporal parKKoning – Update propagaKon – Scheduling
• Add stream warehouse-‐specific features and opKmizaKons
DBMS-‐based
• Design decisions: – Row store (Data Depot/Daytona, Truviso/Postgres) vs. column store (DataCell/MonetDB, SAP HANA, VerKca)
– Disk (Data Depot, Truviso) vs. main memory (DBToaster, SAP HANA)
DBMS-‐based • Pros:
– Leverage SQL, query opKmizaKon, data storage
• Cons: – Not quite real-‐Kme
Hadoop / Map-‐Reduce based • HOP (Hadoop Online Prototype) • Idea: instead of waiKng for all mappers to finish, send output incrementally from mappers to reducers – periodically invoke reducers on the available data
Hadoop / Map-‐Reduce based
• MapUpdate/Muppet (Walmart Labs), similar ideas in: Incoop, SCALLA – Reduce: for each key, process all values and return a single output value
– Update: given a new (k,v) pair, return an updated output value using the new pair and state of k
• And update the state
Hadoop / Map-‐Reduce based • Nova (Yahoo)
– “Pipelining” between jobs in a workflow (in large batches)
– Pass a “delta” to the next job in a workflow
Hadoop / Map-‐Reduce based • Pros:
– Leverage scale-‐out and fault tolerance • Cons:
– Again, not quite real-‐Kme
How? • Common elements in a stream warehouse – Temporal parKKoning – Update propagaKon / workflow – Temporal dimension tables – Temporal consistency management
Temporal ParKKoning
• The primary parKKoning field is the record Kmestamp • Stream data is mostly sorted • Most new data loads into a new parKKon
– Avoid rebuilding indices • Simplified data expiraKon – roll off oldest parKKons
Time
Data
Index
New data
Update PropagaKon / Workflow • Streaming analyKcs – maintain a system of complex materialized views
• Push new data through base tables to all dependent tables – Create new parKKons – Update exisKng parKKons as needed
TwiCer feeds
AcKve measure
Link uKl
Customer complaint
Service alerts
SenKment analysis
Hourly aggregate
Daily aggregate
Temporal Dimension Tables • Most streaming data describes events
– Occurs in a point in Kme, or is a measurement during a well-‐defined interval
• Some streaming data defines condi#ons – ProperKes of an enKty that endures for a Kme interval – Temporal dimension tables – Kmestamp is valid Kme interval.
• Pervasive use – You can’t evaluate an event without knowing about the environment
– Link speeds, cell tower locaKons, power grid organizaKon • Snapshot tables don’t work
– Late arriving data, recomputaKon, new long-‐term analyses.
Temporal Dimension Table Example SNMP_BytesTransferred
Ip_address Timestamp Bytes_xfered
4.3.2.1 1:05 1,000,000
4.3.2.1 1:10 1,200,000
4.3.2.1 1:15 2,200,000
LinkSpeed Ip_address Tlo Thi Speed
4.3.2.1 12:15 1:15 1,000,000 B/min
4.3.2.1 1:15 -‐ 2,000,000 B/min
Ip_address Timestamp UKlizaKon
4.3.2.1 1:05 .2
4.3.2.1 1:10 .24
4.3.2.1 1:15 .22
LinkUKlizaKon
Temporal Dimension Tables • Updates
– Snapshots of current status, deltas. • Snapshot windows in StreamInsight • Compute from the stream
– Frames – based on a condiKon of records in a stream
– Interval punctuaKon
OpKmizaKons
• DB-‐toaster • MulK-‐version Concurrency Control • ParKKon Restructuring • ParKKon Revisions • Temporal Consistency Management • Scheduling
DB-‐toaster • Maintain complex
aggregate views over streaming data.
• In-‐memory architecture : all storage is via hash table. – 1TB main memory servers are
inexpensive • Uses novel recursive-‐delta
technique to accelerate maintenance – CollecKon of support views
that can significantly reduce update Kme.
Join(R,S,T))
Join(S,T)) Join(R,T)) Join(R,S))
T) S) R)
MulK-‐version Concurrency Control • MVCC allows queries and updates to proceed concurrently – Read isolaKon – Long analyKc queries do not block real-‐Kme updates
• Single-‐updater MVCC is cheap and easy – Use a directory-‐swap algorithms
• Encourages use of cloud-‐friendly write-‐once files.
ParKKon Restructuring • As data ages, its best representaKon changes
– Most recent data : opKmize for fast ingest – Stable data : opKmize for queries – Historical data : minimize storage cost
• Restructure parKKons as the data ages – MVCC allows data maintenance to occur as a non-‐interfering background task
ParKKon Size
• New parKKons should match the update increment
• Problem : parKKon explosion – 1 minute parKKons, 1440 per day, 525,600 per year
• Merge parKKons as they age
Time
Data
Index
Indexes opKonal
Data Layout • Write-‐opKmized data
– Row-‐oriented, lightly indexed, uncompressed • Read-‐opKmized data
– Highly indexed, lightly compressed, column storage if beneficial
• Transform as a background task when the data becomes stable – Combine with parKKon merging
• Aggressive compression for archival data • ImplementaKons in SAP HANA and VerKca
ParKKon Revisions
• Some data always arrives late • Problem : need to recompute exisKng parKKons – Disk prefers sequenKal access – Write-‐once files : need to recompute the enKre parKKon
• SoluKon: chain updates to the parKKon – Value of the parKKon is the sum of the primary (anchor) contents plus the updates (revisions).
ParKKon Revisions
• Problem: Don’t change old parKKons, but what if data arrives out-‐of-‐order?
• SoluKon: Overflow chains (Truviso)
Time
anchor
revisions
Packet_Stream
Packets
• Works with “raw” and derived/aggregated data
• E.g., packet counts:
Data Layout
1000 1200 1150 1400
Time
25
Packet_Stream
Packets
Packet_counts
Temporal Consistency Management
• TradiKonal noKon of consistency : a snapshot of the system.
• Doesn’t apply in a stream warehouse – Late-‐arriving data is common – Different data sources have different Kme lags and different likelihoods of late data
• Instead, label data by its degree of completeness
0
2
4
6
8
10
12
0 100000 200000 300000 400000 500000 600000
Num
ber o
f Windo
ws
Time ( seconds)
Number of windows per package
Query Stability • How do I know when the data is stable enough to query?
• What is stable enough? – Data will never change – Data won’t change much. – I’ll take whatever is there.
Consistency Levels • PunctuaKons on parKKons that indicate completeness.
• Example (simple) collecKon of consistency levels – Open : The parKKon should have some data in it. – Closed : The parKKon will not change. – Complete : the parKKon will not change, and all data has been received.
• Closed is a guess – WeaklyClosed, StronglyClosed
• Infer at base tables, propagate inferences to materialized views.
Workflow Scheduling • Need to limit resource use to avoid thrashing.
– Hundreds of tables to update, limited (CPU, memory, cache, network) resources.
– Exclusive resources: non-‐preempKve scheduling. • Ensure that high-‐priority jobs can execute
– Real-‐Kme scheduling • Measures of lateness:
– Staleness : difference between current Kme and most recent data.
– Tardiness : the difference between a task deadline and task compleKon.
Workflow Scheduling • Staleness funcKon:
difference between current Kme and most recent data loaded
• Hierarchies of views with highly varying execuKon Kmes.
9:30 9:45 10:00 10:15
TwiCer feeds
AcKve measure
Link uKl
Customer complaint
Service alerts
SenKment analysis
Hourly aggregate
Daily aggregate
fast frequent
slow infrequent
Bounded Tardiness Scheduling • Bound on the maximum tardiness of any task in a task set.
• If update jobs are scheduled regularly, bounded tardiness => bounded staleness
• Most real-‐Kme scheduling algorithms have bounded tardiness – EDF, minimum slack, etc. – There can be differences in the tardiness bounds
• Pick a heurisKc that works well – E.g. pick the task that provides the largest marginal reducKon in staleness.
Track Scheduling • ComplicaKon: Large differences in task execuKon Kme – Update a base table with 1 minute of data vs. compute a daily aggregate.
• Tardiness bounds depend on the largest task execuKon Kmes. – Long tasks block short criKcal tasks.
• Track Scheduling : – parKKon tasks by execuKon Kme. – Restrict the number of long tasks that can execute concurrently
– Reserve resources for short criKcal tasks
Transient Overload • Common source of overload : catch-‐up processing. – A feed breaks for a day, then is restored. – The source schema changes, requiring a pause in processing to change update procedures.
– New tables load a long history • Update Chopping
– Break a (temporally) long update into short segments. • Update period adjustment
– Decrease the period of backlogged tables to use up (but not oversubscribe) available resources.
Data Stream Quality
• New data quality problems – SystemaKc errors in machine-‐generated streams – Correlated glitches – Missing/delayed data
• New semanKcs – RelaKonal data: keys, FDs, CFDs – Streaming/temporal data: order, arrival frequency (sequenKal dependencies), conservaKon laws
Open Problems • Hybrid system architectures and cross-‐system opKmizaKons
• Big and fast analyKcs as a cloud service • Big/fast data mining • Data stream quality/profiling • Complexity management and administraKon of a big/fast data management system
Bibliography • ApplicaKons
– Smart Grid • hCp://energy.gov/oe/technology-‐development/smart-‐grid
– Semiconductor Manufacturing • hCp://www.appliedmaterials.com/technologies/library/techedge-‐prizm
• hCp://www.extremetech.com/extreme/155588-‐applied-‐materials-‐designs-‐tools-‐to-‐leverage-‐big-‐data-‐and-‐build-‐beCer-‐chips
• Networking ApplicaKons – C. Kalmanek et al., Darkstar: Using Exploratory Data Mining to Raise the
Bar on Network Reliability and Performance, DRCN 2009. – H. Yan, A. Flavel, Z. Ge, A. Gerber, D. Massey, C. Papadopoulos, H. Shah, J.
Yates: Argus: End-‐to-‐end service anomaly detecKon and localizaKon from an ISP's point of view. INFOCOM 2012:2756-‐2760
Bibliography • DSMS-‐based systems
– D. J. Abadi, D. Carney, U. ÇeKntemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, S. B. Zdonik: Aurora: a new model and architecture for data stream management. VLDB J. 12(2): 120-‐139 (2003)
– M. Balazinska, Y. C. Kwon, N. Kuchta, D. Lee: Moirae: History-‐Enhanced Monitoring. CIDR 2007: 375-‐386
– N. Dindar, P. M. Fischer, M. Soner, N. Tatbul: Efficiently correlaKng complex events over live and archived data streams. DEBS 2011: 243-‐254
Bibliography
• DBMS-‐based systems – Truviso : S. Krishnamurthy, M. J. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, N. Thombre: ConKnuous analyKcs over disconKnuous streams. SIGMOD 2010:1081-‐1092
– DataCell: E. Liarou, R. Goncalves, S. Idreos: ExploiKng the power of relaKonal databases for efficient stream processing. EDBT 2009: 323-‐334
– L. Golab, T. Johnson, J. S. Seidel, V. Shkapenyuk: Stream warehousing with DataDepot. SIGMOD Conference 2009: 847-‐854
Bibliography • Hadoop / Map-‐Reduce Based Systems
– T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, R. Sears: MapReduce Online. NSDI 2010: 313-‐328
– W. Lam, L. Liu, S. T. S. Prasad, A. Rajaraman, Z. Vacheri, A. H.i Doan: Muppet: MapReduce-‐Style Processing of Fast Data. PVLDB 5(12): 1814-‐1825 (2012)
– C. Olston, G. Chiou, L. Chitnis, F. Liu, Y. Han, M. Larsson, A. Neumann, V. B. N. Rao, V. Sankarasubramanian, S. Seth, C. Tian, T. ZiCornell, X. Wang: Nova: conKnuous Pig/Hadoop workflows. SIGMOD Conference 2011: 1081-‐1090
– B. Li, E. Mazur, Y. Diao, A. McGregor, P. J. Shenoy: SCALLA: A Planorm for Scalable One-‐Pass AnalyKcs Using MapReduce. ACM Trans. Database Syst. 37(4): 27 (2012)
– P. BhatoKa, A. Wieder, R. Rodrigues, U. A. Acar, R. Pasquin: Incoop: MapReduce for incremental computaKons. SoCC 2011: 7
Bibliography
• Late Arriving Data – S. Krishnamurthy et al., ConKnuous analyKcs over disconKnuous
streams, SIGMOD 2010, 1081-‐1092 – J. Li. K.Ture, V. Shkapenyuk, V. Papadimos, T. Johnson, D. Maier, Out-‐
of-‐order processing: a new architecture for high-‐performance stream systems, PVLDB 1(1): 274-‐288 (2008).
– Lukasz Golab, Theodore Johnson: Consistency in a Stream Warehouse. CIDR 2011: 114-‐122
Bibliography • Update PropagaKon / Workflow
– T. Johnson, V. Shkapenyuk: Update PropagaKon in a Streaming Warehouse. SSDBM 2011: 129-‐149
– C. Olston et al. Nova: conKnuous Pig/Hadoop workflows. SIGMOD Conference 2011: 1081-‐1090
• Temporal Dimension Tables – Interval Event Stream Processing, M. Li, M. Mani, E. A. Rundensteiner., D. Wang, T Lin, DEBS 2008
– David Maier, Michael Grossniklaus, Sharmadha Moorthy, KrisKn Ture: Capturing episodes: may the frame be with you. DEBS 2012:1-‐11
– Snapshot windows: hCp://msdn.microsor.com/en-‐us/library/ff518550.aspx
Bibliography • MulK-‐Version Concurrency Control
– D. Quass, J. Widom: On-‐Line Warehouse View Maintenance. SIGMOD Conference 1997: 393-‐404
– V. Sikka, F. Färber, W. Lehner, S. K. Cha, T. Peh, Christof B.: Efficient transacKon processing in SAP HANA database: the end of a column store myth. SIGMOD Conference 2012: 731-‐742
• Data ParKKon TransformaKons – V. Sikka, F. Färber, W. Lehner, S. K. Cha, T. Peh, B. Christof: Efficient transacKon processing in SAP HANA database: the end of a column store myth. SIGMOD Conference 2012: 731-‐742
– A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandier, L. Doshi, C. Bear: The VerKca AnalyKc Database: C-‐Store 7 Years Later . PVLDB 5(12): 1790-‐1801 (2012)
Bibliography • DB Toaster
– DBToaster: Higher-‐order Delta Processing for Dynamic, Frequently Fresh Views, Y. Ahmad O. Kennedy, C. Koch, . M. Nikolic, Proc VLDB 2012
• ParKKon Revisions – S. Krishnamurthy, M. J. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, N. Thombre: ConKnuous analyKcs over disconKnuous streams. SIGMOD 2010:1081-‐1092
• Temporal Consistency Management – Lukasz Golab, Theodore Johnson: Consistency in a Stream Warehouse. CIDR 2011:114-‐122
• Bounded Tardiness Scheduling – H. Leontyev, J. H. Anderson: Generalized tardiness bounds for global mulKprocessor scheduling. Real-‐Time Systems 44(1-‐3): 26-‐71 (2010)
Bibliography • Stream Warehouse Scheduling
– Lukasz Golab, Theodore Johnson, Vladislav Shkapenyuk: Scalable Scheduling of Updates in Streaming Data Warehouses. IEEE Trans. Knowl. Data Eng. 24(6): 1092-‐1105 (2012)
– S. Guirguis, M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, K. Pruhs, AdapKve Scheduling of Web TransacKons. Proc. 2009 Intl. Conf. on Data Engineering
Bibliography • Data stream quality
– Lukasz Golab, Howard J. Karloff, Flip Korn, Avishek Saha, Divesh Srivastava: SequenKal Dependencies. PVLDB 2(1): 574-‐585 (2009)
– Lukasz Golab, Howard J. Karloff, Flip Korn, Barna Saha, Divesh Srivastava: Discovering ConservaKon Rules. ICDE 2012: 738-‐749
– Tamraparni Dasu, Ji Meng Loh: StaKsKcal DistorKon: Consequences of Data Cleaning. PVLDB 5(11): 1674-‐1683 (2012)
– Lukasz Golab, Data Warehouse Quality: Summary and Outlook, In: S. Sadiq (ed.), Handbook of Data Quality -‐ Research and PracKce, Springer-‐Verlag Berlin Heidelberg 2013