+ All Categories
Home > Documents > Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin...

Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin...

Date post: 31-Mar-2015
Category:
Upload: jagger-slaughter
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
62
Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB
Transcript
Page 1: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big Data Management – Challenges and Opportunities –

an Incomplete Survey

Jiaheng LuRenmin University of China

Joint work with Yu Liu

Tutorial on HotDB

Page 2: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Tutorial objectives

• Big data challenges• Big data management new principles• Big data management research

– Indexes– Transaction– Architecture– Application– Benchmark

Page 3: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big data challenge

• Big data– Science data– Finance data– Streaming data– Internet data

Page 4: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big data management challenge

The growth in database transactions and volumes has a large impact on response times Source: http://www.codefutures.com/database-sharding/

Page 5: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Many techniques have been evolved ..

• Master/Slave

• Cluster Computing

• Table Partitioning

• Federated Tables

Page 6: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Four new principles in big data management

Page 7: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

New principle in big data management ( 1 )

• Partition Everything and key-value storage

• 切分万物以治之

•1st normal form cannot be satisfied

Page 8: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

New principle in big data management ( 2 )

• Embrace Inconsistency

• 容不同乃成大同

•ACID properties are not satisfied

Page 9: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

New principle in big data management ( 3 )

• Backup everything with three copies

• 狡兔三窟方高枕

• Guarantee 99.999999% safety

Page 10: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

New principle in big data management ( 4 )

• Scalable and high performance

•运筹沧海量兼容

Page 11: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big data management

•切分万物以治之•Partition Everything•容不同乃成大同•Embrace Inconsistency•狡兔三窟方高枕•Backup data with three copies•运筹沧海量兼容•Scalable and high performance

Page 12: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big Data Management Indexes on Big Data

Transaction on Big Data

Processing Architecture on Big Data

Applications in MapReduce Parallel Processing

Benchmark of Big Data Management System

Page 13: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Related Papers

0

2

4

6

8

10

12

14

2009 2010 2011

SIGMOD

VLDB

ICDE

Page 14: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Related Papers

00.5

11.5

22.5

33.5

44.5

Index on Big Data

Transaction on Big Data

Architecture Applications Benchmark

2009

2010

2011

Page 15: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big data papers (incomplete data)

Indexes on Big Data ~ 4 papersTransaction on Big Data 4~5 papersProcessing Architecture on Big Data

6~7 papersApplications in MapReduce Parallel

Processing 6~7 papers

Benchmark of Big Data Management System

3~4papers

Page 16: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big Data Management Indexes on Big Data

Transaction on Big Data

Processing Architecture on Big Data

Applications in MapReduce Parallel Processing

Benchmark of Big Data Management System

Page 17: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Indexes on Big Data

Construct indexes which can be maintained in an incremental way.

Avoid bottleneck in the tree-like structure to provide concurrent reading and writing operations

Page 18: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Distributed B-TreeGoal: perform consistent concurrent updates whileallowing high concurrency(read)

M. K. Aguilera, W. Gloab, et al. A Practical Scalable Distributed B-Tree. VLDB 2008

Indexes on Big Data

Page 19: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Distributed B-Tree

3 techniques: Transaction– optimistic concurrency Control Lazy replication of version numbers

at clients Eager replication of version numbers

at servers

M. K. Aguilera, W. Gloab, et al. A Practical Scalable Distributed B-Tree. VLDB 2008

Indexes on Big Data

Page 20: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Use BATON overlay to support range queris Local B+-tree index & Cloud Global(CG) index Only publish a few local index to global index to get high throughput and

concurrencySai Wu, Dawei Jiang, et al. Efficient B-tree Based Indexing for Cloud Data Processing. VLDB 2010

Indexes on Big Data

Page 21: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

BATON overlay

Steps to retrieve data:1. Search in the BATON tree(lookup());2. For all overlapping nodes in global index, find the corresponding

nodes(and local index)3. Search in the local B+-Tree index to retrieve data

Sai Wu, Dawei Jiang, et al. Efficient B-tree Based Indexing for Cloud Data Processing. VLDB 2010

Indexes on Big Data

Page 22: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big Data Management Indexes on Big Data

Transaction on Big Data

Processing Architecture on Big Data

Applications in MapReduce Parallel Processing

Benchmark of Big Data Management System

Page 23: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

The CAP Theorem

Consistency

Partition tolerance

Availability

Page 24: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

The CAP Theorem

Once a writer has written, all readers will see that write

Consistency

Partition tolerance

Availability

Page 25: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

The CAP Theorem

System is available during software and hardware upgrades and node failures.

Consistency

Partition tolerance

Availability

Page 26: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

The CAP Theorem

A system can continue to operate in the presence of a network partitions.

Consistency

Partition tolerance

Availability

Page 27: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

The CAP Theorem

Theorem: You can have at most two of these properties for any shared-data system

Consistency

Partition tolerance

Availability

Page 28: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Consistency

• Two kinds of consistency:– strong consistency – ACID(Atomicity Consistency Isolation

Durability)

– weak consistency – BASE(Basically Available Soft-state Eventual consistency )

Page 29: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

A tailor

3NFTRANSACTION

LOCK ACID

SAFETY

RDBMS

Page 30: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

“Not all data need to be treated at the same level of consistency.”

Goal : minimize overall cost of operations in cloud Consistent Rationing

Define consistency guarantees on the data instead at the transaction level

Switch consistency guarantees at runtime, automatically3 categories

T. Kraska, M. Hentschel, et al. Consistency Rationing in the Cloud: Pay only when it matters. VLDB 2009

Transaction on Big Data

Page 31: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Transaction on Big Data

Category C: Session Consistency (temporal) inconsistency is acceptable read-your-own-writes monotonicity converge & achieve eventual consistency at some interval

Category A: Serializable Consistency violation results in large penalty costs

Category B: trade-off between cost per operation & consistency level Adaptive. Switch between session consistency and serializability at

runtime

T. Kraska, M. Hentschel, et al. Consistency Rationing in the Cloud: Pay only when it matters. VLDB 2009

Page 32: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Category B: trade-off between cost per operation & consistency level General Policy

“higher consistency level need to be provided when conflicts(updates) is high.”

Time Policywhen “deadline” approaches, more commits.

Fixed Threshold Policy (for numeric type)

Dynamic Policy (for numeric type)

Y: sum of update value

T. Kraska, M. Hentschel, et al. Consistency Rationing in the Cloud: Pay only when it matters. VLDB 2009

Transaction on Big Data

Page 33: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

• Datalog and coordination complexity: theoretical results from PODS aspects

(PODS keynote 2011 Joseph M. Hellerstein, UC Berkeley)

Page 34: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Datalog• Main expressive advantage: recursive

queries. • More convenient for analysis: papers look

better.• Without recursion but with negation it is

equivalent in power to relational algebra• Has affected real practice: (e.g., recursion

in SQL3, magic sets transformations).

Page 35: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Datalog• Example Datalog program:• parent(bill,mary). parent(mary,john).

• ancestor(X,Y) :- parent(X,Y). ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y).

• ?- ancestor(bill,X)

Page 36: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Joseph’s Conjecture(1)• CONJECTURE 1. Consistency And Logical

Monotonicity (CALM).• A program has an eventually consistent,

coordination-free execution strategy if and only if it is expressible in (monotonic) Datalog.

Page 37: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Joseph’s Conjecture (2)• CONJECTURE 2. Causality Required Only for

Non-monotonicity (CRON). • Program semantics require causal message

ordering if and only if the messages participate in non-monotonic derivations.

Page 38: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Joseph’s Conjecture (3)• CONJECTURE 3. The minimum number of

Dedalus timesteps required to evaluate a program on a given input data set is equivalent to the program’s Coordination Complexity.

Page 39: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Joseph’s Conjecture (4)• CONJECTURE 4. Any Dedalus program P can be

rewritten into an equivalent temporally-minimized program P’ such that each inductive or asynchronous rule of P’ is necessary: converting that rule to a deductive rule would result in a program with no unique minimal model.

Page 40: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Circumstance has presented a rare opportunity—call it an imperative—for the database community to take its place in the sun, and help create a new environment for parallel and distributed computation to flourish.

------Joseph M. Hellerstein (UC Berkeley)

Page 41: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big Data Management Indexes on Big Data

Transaction on Big Data

Processing Architecture on Big Data

Applications in MapReduce Parallel Processing

Benchmark of Big Data Management System

Page 42: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Processing Architecture on Big Data

Make MapReduce more powerful, especially on complicated analysis

Merge cloud computing systems and PDBMSs

Page 43: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Mapreduce online testing platform

• Cloudcomputing.ruc.edu.cn

• Automatic evaluation of Hadoop Mapreduce codes

• Theoretical questions

Page 44: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

开放式 Mapreduce 测试平台cloudcomputing.ruc.edu.cn

Page 45: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

“Sort-merge implementation in Hadoop poses fundamental barrier to incremental one-pass analysis”

New Hash-Based Platform

Processing Architecture on Big Data

B. Li, E. Mazur, et al. A Platform for Scalable One-Pass Analytics using MapReduce. SIGMOD 2011

Page 46: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Fast Join Processing in Data WarehousePartitioning Data into Vertical Groups Dynamically

Y. Lin, D. Agrawal, et al. Llama: Leveraging Columnar Storage for Scalable Join Processing in the MapReduce Framework. SIGMOD 2011

Processing Architecture on Big Data

Page 47: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Fast Join Processing in Data WarehousePartitioning Data into Vertical Groups DynamicallyConcurrent Join

More Map-side JoinsBASIC PATTERNS: Star Pattern & Chain Pattern

Processing Architecture on Big Data

Page 48: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Processing Architecture on Big Data

Make MapReduce more powerful, especially on complicated analysis

Merge cloud computing systems and PDBMSs

Page 49: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

HadoopDB Combination of Parallel DBMS(performance) and MapReduce(scalability, fault-

tolerance) Communication layer : MapReduce

nodes: single-node DBMS instances SMS Planner: SQL MapReduce Job SQL

Processing Architecture on Big Data

Page 50: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big Data Management Indexes on Big Data

Transaction on Big Data

Processing Architecture on Big Data

Applications in MapReduce Parallel Processing

Benchmark of Big Data Management System

Page 51: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

A. Okcan, M. Riedewald. Processing Theta-Joins using MapReduce. SIGMOD 2011 Discuss some Theta-Joins(Inequality-Joins)algorithms

Applications in MapReduce Parallel Processing

Page 52: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

R. Vernica, M. J. Carey, et al. Efficient Set-Similarity Joins Using MapReduce. SIGMOD 2010

Use MapReduce Framework to perform set-similarity join, i.e. given two(or one) files, find all pairs of records (a, b) satisfying a and b are similar(sim(a, b)> t)

Give algorithms coping with large amount of data, as well as experimental evaluation.

Applications in MapReduce Parallel Processing

Page 53: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Big Data Management Indexes on Big Data

Transaction on Big Data

Processing Architecture on Big Data

Applications in MapReduce Parallel Processing

Benchmark of Big Data Management System

Page 54: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Benchmark of Big Data Management System

Comparison of the performance between MapReduce paradigm and parallel DBMSs

PERFORMANCE PDBMSs >> MR systems (except data loading)

ComparisonSchema SupportIndexingProgramming ModelData DistributionExecution StrategyFlexibilityFault Tolerance

A. Pavlo, E. Paulson, et al. A Comparison of Approaches to Large-Scale Data Analysis. SIGMOD 2010

Page 55: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Benchmark of Big Data Management System

Comparison of the performance between MapReduce paradigm and parallel DBMSs

PERFORMANCE PDBMSs >> MR systems (except data loading)

ComparisonSchema SupportIndexingProgramming ModelData DistributionExecution StrategyFlexibilityFault Tolerance

A. Pavlo, E. Paulson, et al. A Comparison of Approaches to Large-Scale Data Analysis. SIGMOD 2010

Page 56: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

How architectures affect cloud computing (performance) on database applications?Especially for OLTP?

D. Kossmann, T. Kraska, et al. An Evaluation of Alternative Architectures for Transaction Processing in the Cloud. SIGMOD 2010

Benchmark of Big Data Management System

Page 57: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

How architectures affect cloud computing(performance) on database applications?Especially for OLTP?

D. Kossmann, T. Kraska, et al. An Evaluation of Alternative Architectures for Transaction Processing in the Cloud. SIGMOD 2010

Benchmark of Big Data Management System

Page 58: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

How architectures affect cloud computing(performance) on database applications?Especially for OLTP?

D. Kossmann, T. Kraska, et al. An Evaluation of Alternative Architectures for Transaction Processing in the Cloud. SIGMOD 2010

Benchmark of Big Data Management System

Page 59: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

Conclusion• Big Data Management: HOT DB topic

• Research topics: Indexing, transaction, join, architecture, application,

benchmark

Page 60: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

References• Sai Wu, Dawei Jiang, et al. Efficient B-tree Based Indexing for Cloud Data

Processing. VLDB 2010• David Chiu, A. Shetty, et al. Evaluating and Optimizing Indexing Schemes for a

Cloud-based Elastic Key-Value Store. In 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

• J. Wang, S. Wu, et al. Indexing Multi-dimensional Data in a Cloud System. SIGMOD 2010

• D. Kossmann, T. Kraska, et al. An Evaluation of Alternative Architectures for Transaction Processing in the Cloud. SIGMOD 2010

• T. Kraska, M. Hentschel, et al. Consistency Rationing in the Cloud: Pay only when it matters. VLDB 2009

• H. T. Vo, C. Chen, et al. Towards Elastic Transactional Cloud Storage with Range Query Support. VLDB 2010

• H. Kllapi, E. Sitaridi, et al. Schedule Optimization for Data Processing Flows on the Cloud. SIGMOD 2011

• M. K. Aguilera, W. Gloab, et al. A Practical Scalable Distributed B-Tree. VLDB 2008

Page 61: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

References• E. Friedman, P. Pawlowski, et al. SQL/MapReduce: A Practical approach to self-

describing, polymorphic, and parallelizable user-defined functions. VLDB 2009• R. Vernica, M. J. Carey, et al. Efficient Set-Similarity Joins Using MapReduce.

SIGMOD 2010• S. Blanas, J. M. Patel, et al. A Comparison of Join Algorithms for Log Processing in

MapReduce. SIGMOD 2010• D. Logothetis, K. Yocum. Ad-Hoc Data Processing in the Cloud. VLDB 2008• B. Panda, J. S. Herbach, et al. PLANET: Massively Parallel Learning of Three

Ensembles with MapReduce. VLDB 2009• A. Okcan, M. Riedewald. Processing Theta-Joins using MapReduce. SIGMOD 2011• K. Morton, M. Balazinska, et al. ParaTimer: A Progress Indicator for MapReduce

DAGs. SIGMOD 2010• Y. Cao, C. Chen, et al. ES2: A Cloud Data Storage System for Supporting Both OLTP

and OLAP. ICDE 2011• K. Morton, A. Friesen, et al. Estimating the Progress of MapReduce Pipelines. ICDE

2010

Page 62: Big Data Management – Challenges and Opportunities – an Incomplete Survey Jiaheng Lu Renmin University of China Joint work with Yu Liu Tutorial on HotDB.

References• W. Lang, J.M. Patel. Energy Management for MapReduce Clusters. VLDB 2010• T. Nykiel, M. Potamias, et al. MRShare: Sharing Across Multiple Queries in

MapReduce. VLDB 2010• C. Olston, G. Chiou, et al. Nova: Continuous Pig/Hadoop Workflows. SIGMOD 2011• Y. Lin, D. Agrawal, et al. Llama: Leveraging Columnar Storage for Scalable Join

Processing in the MapReduce Framework. SIGMOD 2011• B. Li, E. Mazur, et al. A Platform for Scalable One-Pass Analytics using MapReduce.

SIGMOD 2011• D. G. Campbell, G. Kakivaya, et al. Extreme Scale with Full SQL Language Support in

Microsoft SQL Azure. SIGMOD 2010• A. Abouzeid, K. B-Pawlikowski, et al. HadoopDB: An Architectural Hybrid of

MapReduce and DBMS Technologies for Analytical Workloads. VLDB 2009• Y. Xu, P. Kostamaa, et al. Integrating Hadoop and Parallel DBMS. SIGMOD 2010• J. A. Q-Ruiz, C. Pinkel, et al. RAFT at Work: Speeding-Up MapReduce Applications

under Task and Node Failures. SIGMOD 2011• A. Pavlo, E. Paulson, et al. A Comparison of Approaches to Large-Scale Data

Analysis. SIGMOD 2010


Recommended