Date post: | 07-Apr-2018 |
Category: |
Documents |
Upload: | gopal-chandu |
View: | 216 times |
Download: | 0 times |
of 42
8/4/2019 Hacigumus Slides
1/42
CloudDB:
A Data Store for all Sizes in the Cloud
Hakan Hacigumus
Data Management Research
NEC Laboratories America
http://www.nec-labs.com/dm
www.nec-labs.com
8/4/2019 Hacigumus Slides
2/42
2 NEC Labs Data Management Research
What I will try to cover
Historical perspective and motivation
(Preliminary) Technical Approach
Current Status
Food for Thought
8/4/2019 Hacigumus Slides
3/42
3 NEC Labs Data Management Research
Why Data Management Research?
Many Data ManagementTechnologies and Productshave been around
Data Centers have evolvedover the time
Data Center hostingbecame a business
Database Community wassuccessful in creatingtechnologies and business
8/4/2019 Hacigumus Slides
4/42
4 NEC Labs Data Management Research
Why Data Management (Again)?
Amount of Data
Amount of business
data doubles every
12-18 months
New Data Types
Relational
databases only
manage 10-15% of
the available data
New Data Sources
Individual user via
Web2.0 applications,
social sides,
collaboration, mobile
devices, sensors, etc
New Usage Patterns
Around the clock,
around the world,
highly interconnected
Large Number of Users
Unprecedented increase
and fluctuations
New Type of Apps
Highly integrated,
Extremely data
intensive
(Good Old)
Database
8/4/2019 Hacigumus Slides
5/42
5 NEC Labs Data Management Research
Cloud Computing
A paradigm shift in how and where a workload is generated and it getsexecuted
Cloud service provider Cloud service consumer
Market Size Data Management Market ~$20B
IT Cloud Service ~$42B (by 2012) (IDC)
Cloud Provider
A
P
I
8/4/2019 Hacigumus Slides
6/42
6 NEC Labs Data Management Research
Cloud Computing
A paradigm shift in how and where a workload is generated and it getsexecuted
Cloud service provider Cloud service consumer
Market Size Data Management Market ~$20B
IT Cloud Service ~$42B (by 2012) (IDC)
Cloud Provider
A
P
I
8/4/2019 Hacigumus Slides
7/427 NEC Labs Data Management Research
Animoto on Amazon EC2
Rapid growth in three days, the number of users increased from 25k to 250k Number of servers from 50 to 3500
Assume $500 per machine, $1.75M!
Instead, they used Amazon EC2
A no-infrastructure startup
Biggest piece of hardware
A (fancy) espresso
machine!
Problem: It is not trivial to distribute users
accesses to the data by just scaling out
cloud computing nodes
8/4/2019 Hacigumus Slides
8/428 NEC Labs Data Management Research
Database-as-a-Service?
ICDE 2002!
Reaction: Cool butTechnology
Regulations
Psychological
Acceptance
Business
Model
8/4/2019 Hacigumus Slides
9/429 NEC Labs Data Management Research
Data Management in Cloud
Cloud computing model may provide a platform toaddress new challenges
But the problem is:
Data Management Systems were notdesigned andimplemented with cloud computing model in mind
So the question is:What are the data management challenges we need to
address before the full potential of cloud computing canbe realized?
8/4/2019 Hacigumus Slides
10/4210 NEC Labs Data Management Research
Need for New Solutions
Massive scalability to handle Very large amount of data
Very large number of diverse users/requests
Elasticity to handle varying demand
optimize operating costs
Flexibility to handle different data and processing models
Massively multi-tenanted to achieve economies of scale
More intelligent system monitoring and management
8/4/2019 Hacigumus Slides
11/4211 NEC Labs Data Management Research
Cloud Data Management Challenges
# of queries / sec
# of records / query
Large
Analytic
apps
(OLAP)
Large
Transactional
apps (OLTP)
Small
appsKey challenge:
scalable multi-tenant hosting
Keychallenge:
scalable
read/write
Key challenge:scalable scan
and
aggregation
Key challenge:
seamless data
management
Ultimate goal
Query scalability
Data scalability
Multi-tenancy
CloudDB
8/4/2019 Hacigumus Slides
12/4212 NEC Labs Data Management Research
Buy All Sizes?
OLTPOLAP
? NO!
8/4/2019 Hacigumus Slides
13/4213 NEC Labs Data Management Research
Buy One Size?
OLTP
OLAP
8/4/2019 Hacigumus Slides
14/4214NEC Labs Data Management Research
Let Someone Else Do All That
OLTPOLAP
Access and Management
8/4/2019 Hacigumus Slides
15/4215 NEC Labs Data Management Research
Let Someone Else Do All That
OLTPOLAP
Access and Management
Leveraging very
specializeddatabase
technologies
Easier integration
with applications
Easier adoption bydevelopers
(dominant force for
adoption of cloud!)
Easier and more flexible
deployment options in the
middleware
8/4/2019 Hacigumus Slides
16/4216 NEC Labs Data Management Research
Wish Lists
Clients
- Standard language API (e.g.,
SQL)
- Identifiable and verifiable
Service Level Agreements
- Common DBMS maintenance
tasks, (e.g. backup, versioning,
patching etc.)
- Availability of value-add
services, such as business
analytics, information sharing,
collaboration etc.
Service Provider
- Satisfying clients SLAs to
sustain revenue
- Great cost efficiency via highlevel of automation and resource
sharing to ensure profitability
- Maintaining an extendable
platform for value-add services
8/4/2019 Hacigumus Slides
17/4217 NEC Labs Data Management Research
(Some) Storage Models
Store Type Main Purpose Pro Con
Relational
- Transaction processing - Standardization
- Higher performance on
Online Transaction
Processing (OLTP)
- ACID properties
- Scalability
Key/Value
- Scalable data storage
- Read/Write intensive
workload
-Scalability - Standardization
- Performance issues
- Complex query
capability
- ACID properties(?)
Column-Oriented
- Analytics processing
- Read optimized,
throughput oriented
-Higher performance on
Online Analytical
Processing (OLAP)
- More flexible schema
evolution (?)
- Standardization
- Complex query
capability
8/4/2019 Hacigumus Slides
18/4218 NEC Labs Data Management Research
Application Scenario
Personal Profile
Management
Address
Phone
Notes
Contacts
Calendars
Reminders
Application v1
Profile
Data
User 1
Data
User 2
Data
Information
Portal
Online Shopping
Catalogs
Product Reviews
Subscriptions
Application v2
Portal
Data
Products
Reviews
.
.
.
.
.
External Sources
RelationalDatabase Key/ValueStore
Very difficult migrationApplication developers (skills, time)
Architects (redesign)
Company (investment)
8/4/2019 Hacigumus Slides
19/4219 NEC Labs Data Management Research
Data Model Decisions
Problem: Users are forced to make a decision on the data modelbased on the current needs of the applications
Is it possible to make the right decision all the time?
Problem: The developer (client) has to re-architect their
application in order to take advantage of different data models How easy is it to change the architecture and the implementation?
# of queries /sec
Single
RDBMSClustering
Sharding
Key-value store
Application
Ver 1.0
Ver
2.0
Ver
3.0
Ver
4.0Workload evolves
8/4/2019 Hacigumus Slides
20/42
20 NEC Labs Data Management Research
Remember Data Independence?
1968
1970
8/4/2019 Hacigumus Slides
21/42
21 NEC Labs Data Management Research
Data Independence
Decouple application logic
from data processing
Let them be optimized and
managed independently
Enabled decades of
innovation and improvement
in databases
8/4/2019 Hacigumus Slides
22/42
22 NEC Labs Data Management Research
Data Independence
The application should not have to be aware of the physical
organization of the data (and how it can be accessed)
All it needs is a logical (declarative) specification
CloudDB makes decisions based on application context, workload
characteristics, etc.
# of queries /sec
Application
CloudDB: A layer for data independence
SQL API
Relational
Store
Key/Value
Store
Analytics
Store
Data Load
Query/Update
8/4/2019 Hacigumus Slides
23/42
23 NEC Labs Data Management Research
Language?
New Breed Databases CouchDB, Project Voldemort (Dynamo), Cassandra,
BigTable, Tokyo Cabinet, MangoDB, SimpleDB, .
MapReduce/Hadoop
8/4/2019 Hacigumus Slides
24/42
24 NEC Labs Data Management Research
Some Reminders about SQL
By far the most widely used data access language
It has nothing to do with
How the data is stored How the queries are executed
How the transactions are handled
Very large number of skilled programmers
Huge amount of existing applications and tools
8/4/2019 Hacigumus Slides
25/42
25 NEC Labs Data Management Research
SQL is actually good?
HIVE: SQL APIop top of MapReduce
Google BigQuery: SQL over data stored in non-relational
databases
.
8/4/2019 Hacigumus Slides
26/42
26 NEC Labs Data Management Research
CloudDB - Guiding Principals
Embrace heterogeneity One size does not fit all
Leverage specialized technologies
Maintain and restore declarative nature of data
processing
Understand and Define dimensions of scalability
Cl dDB Middl
8/4/2019 Hacigumus Slides
27/42
27 NEC Labs Data Management Research
CloudDB MiddlewareOpaque vs. Transparent
System Independence?
The middleware would be responsible for making all the decisions regarding the choice of data
stores, processing the queries, and end-to-end system optimization
While the middleware can abstract away the underlying storage systems, it should explicitly
express certain essential aspects of the system, such as consistency levels and scalability of
transactions
Results
Applications
SQLQueries
API/Language Support (SQL)
C
loudDB
Middleware
.DataStores
Transaction Patterns
Consistency / Scalability
Opaque Transparent
Distributed Query Processor
8/4/2019 Hacigumus Slides
28/42
28 NEC Labs Data Management Research
CloudDB Platform
Results
(External) Applications
SQLQueries
Distributed Query Processor
API/Language Support (JDBC,SQL)Intelligent Cloud Database
Coordinator (ICDC)
WorkloadAnalysis
DesignOptimizer
System MonitorDatabase
ClusterController
Client SLAs
SLA Aware Dispatcher
Scheduler Scheduler Scheduler
CapacityPlanner
Multi TenancyManager (MTM)
Auto Sharding
Relational Store
Internal Query
Processing
Auto Replication Auto Partitioning
Analytics Store
Internal Query
Processing
Auto Replication Auto Partitioning
Internal Query
Processing
Key-Value Store
CloudDB Store
Data Migration
8/4/2019 Hacigumus Slides
29/42
29 NEC Labs Data Management Research
CloudDB Platform Key Points
Results
(External) Applications
SQLQueries
Distributed Query Processor
API/Language Support (JDBC,SQL)Intelligent Cloud Database
Coordinator (ICDC)
WorkloadAnalysis
DesignOptimizer
System MonitorDatabase
ClusterController
Client SLAs
SLA Aware Dispatcher
Scheduler Scheduler Scheduler
CapacityPlanner
Multi TenancyManager (MTM)
Auto Sharding
Relational Store
Internal Query
Processing
Auto Replication Auto Partitioning
Analytics Store
Internal Query
Processing
Auto Replication Auto Partitioning
Internal Query
Processing
Key-Value Store
CloudDB Store
Data Migration
One Unified,
Standard API
Intelligent Analysis and
Decision MakingSpecialized Stores
for Specific Needs
O D t M t Pl tf
8/4/2019 Hacigumus Slides
30/42
30 NEC Labs Data Management Research
Our Data Management Platform
Key Research Areas
Results
(External) Applications
SQLQueries
Distributed Query Processor
API/Language Support (JDBC,SQL)Intelligent Cloud Database
Coordinator (ICDC)
WorkloadAnalysis
DesignOptimizer
System MonitorDatabase
ClusterController
Client SLAs
SLA Aware Dispatcher
Scheduler Scheduler Scheduler
CapacityPlanner
Multi TenancyManager (MTM)
Auto Sharding
Relational Store
Internal Query
Processing
Auto Replication Auto Partitioning
Analytics Store
Internal Query
Processing
Auto Replication Auto Partitioning
Internal Query
Processing
Key-Value Store
CloudDB Store
Data Migration
Intelligent
Management
Workload
Management
Data Stores Specialized Storesfor Specific NeedsIntelligent Analysis and
Decision Making
One Unified,
Standard API
Cl dDB S t A hit t
8/4/2019 Hacigumus Slides
31/42
31 NEC Labs Data Management Research
CloudDB System Architecture --
Microsharding is a partof CloudDB
Results
(External) Applications
SQLQueries
Distributed Query Processor
API/Language Support (JDBC,SQL)Intelligent Cloud Database
Coordinator (ICDC)
WorkloadAnalysis
DesignOptimizer
System MonitorDatabase
ClusterController
Client SLAs
SLA Aware Dispatcher
Scheduler Scheduler Scheduler
CapacityPlanner
Multi TenancyManager (MTM)
Auto Sharding
Relational Store
Internal Query
Processing
Auto Replication Auto Partitioning
Analytics Store
Internal Query
Processing
Auto Replication Auto Partitioning
Internal Query
Processing
Key-Value Store
CloudDB Store
Data Migration
Microsharding
8/4/2019 Hacigumus Slides
32/42
32 NEC Labs Data Management Research
Pool ofServers
SQL over Key-Value Stores
Microsharding to enable SQL over key-value stores
Application
SQL
Key-
access
Applications
Storage nodes
(Storage cloud)
Query execution nodes
(Relational middleware)
Key-Value Store
Application
Pool ofServers
Key challenge:
limited access
capabilities
(only key-based
put/ get)
8/4/2019 Hacigumus Slides
33/42
33 NEC Labs Data Management Research
Microsharding
Key-Value stores are good at scaling write intensiveworkloads
But, they dont leverage a large body of technologies
developed in databases over the decades such as: Relationships
Transactions
Advanced query functions etc.
These are hand-codedby developers
Microsharding aims at bringing those capabilities into key-value stores in a principled way
8/4/2019 Hacigumus Slides
34/42
34 NEC Labs Data Management Research
Key Technical Questions Addressed
How can we map relational schemas to key-value storedata models?
How can we map relational tuples to key-value objects?
Once we have those mappings, how can we definetransaction classes that can be supported in a scalableway in key-value stores?
What are the system implementation issues with such amiddleware?
8/4/2019 Hacigumus Slides
35/42
35 NEC Labs Data Management Research
Query and Data Transformation
Physical design: mapping between relational dataand K/V data
TABLE users (
id primary key)
TABLE reviews (
id: primary key
user_id : foreign key to orders
)
SELECT * FROM users, reviews
WEHRE users.id= reviews.user_id
and users.id = ?
NEST reviews BY user_id
.
users
reviewsreviewsreviews
GET UNNEST
Physical Design
Query planTransformed data
(KV data)
Schema
(+data)
Query (template)
Microshard
User[Review]
8/4/2019 Hacigumus Slides
36/42
36 NEC Labs Data Management Research
Microsharding
A microshard is
a logical unit of data
a principled way to shard a database into small fragments
a unit of transactional data access
is accessed by its key, key of root relation
Key= 1 Key= 2 Key= 3 Key= N
microshard microshard microshard microshard
Transaction on
Users key =1
Transaction on
Users key =1
Transaction on
Users key =2
Transaction on
Users key =3
8/4/2019 Hacigumus Slides
37/42
37 NEC Labs Data Management Research
Isolation Levels
No consistency guarantee on read/write outside of a microshard
T T TT T T
transaction grouptransaction group
microshardmicroshard
Distributed on
key-value store
Distributed
on query
execution
nodes
8/4/2019 Hacigumus Slides
38/42
38 NEC Labs Data Management Research
Scale Independence
Experiment Setup
RUBiS benchmark (eBay type auction application)
Read/Write workload (transition matrix)
Short think time to saturate the system
Voldemort (Dynamo) key-value store
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 2.5 5 7.5 10 12.5 15 17.5 20
Throughput
(1000
sessions
/sec)
Number of emulated concurrent clients (thousands)
3 Voldemort nodes
4 Voldemort nodes
5 Voldemort nodes
6 Voldemort nodes
Message:
Ability to automatically
scale to more concurrent
sessions (throughput)simply by increasing the
number of key-value nodes
8/4/2019 Hacigumus Slides
39/42
39 NEC Labs Data Management Research
Directions/Questions
Support for Specifying Relaxed Consistency Tooling to relax consistency just to the degree that there
exists a feasible solution (physical design and query plans)
for the specification
Scalable Data Organization over heterogeneous data
stores
Physical design over heterogeneous stores such that theservice level specifications are met
Scalability vs. Consistency
8/4/2019 Hacigumus Slides
40/42
40 NEC Labs Data Management Research
The Cast
NEC Labs Researchers
Hakan Hacigumus
Yun Chi
Wang-Pin Hsiung
Hojjat Jafarpour
Hyun J. Moon Oliver Po
Junichi Tatemura
Jagan Sankaranarayanan
Advisors/Collaborators Michael Carey (U. of California, Irvine)
Hector Garcia-Molina (Stanford)
Jeff Naughton (U. of Wisconsin, Madison)
8/4/2019 Hacigumus Slides
41/42
41 NEC Labs Data Management Research
CloudDB would be
A unified data management platform that provides
capabilities to transparentlyand efficientlysupport
heterogeneous workloads by leveraging specialized
storage models with SLA-conscious profit optimization
in the cloud.
8/4/2019 Hacigumus Slides
42/42
Thank You!