+ All Categories
Home > Documents > Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best...

Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best...

Date post: 27-Feb-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
29
Michael Schaarschmidt, Felix Gessert , Norbert Ritter [email protected] Towards Automated Polyglot Persistence
Transcript
Page 1: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Michael Schaarschmidt, Felix Gessert, Norbert [email protected]

Towards Automated Polyglot Persistence

Page 2: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Polyglot PersistenceCurrent best practice

Application Layer

Billing Data Nested Application Data

Session data

Search Index

Files

Amazon Elastic

MapReduce

Google Cloud

StorageFriend

network Cached data & metrics

Recommen-dation Engine

Page 3: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Polyglot PersistenceCurrent best practice

Application Layer

Billing Data Nested Application Data

Session data

Search Index

Files

Amazon Elastic

MapReduce

Google Cloud

StorageFriend

network Cached data & metrics

Recommen-dation Engine

Research Question:

Can we automate the mapping problem?

data database

Page 4: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

VisionSchemas can be annotated with requirements

- Write Throughput > 10,000 RPS- Read Availability > 99.9999%- Scans = true- Full-Text-Search = true- Monotonic Read = true

Schema

DBsTablesFields

Page 5: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

VisionThe Polyglot Persistence Mediator chooses the database

Application

DatabaseMetrics

Data and Operations

db1 db2 db3

Polyglot PersistenceMediator

Latency < 30ms

AnnotatedSchema

Page 6: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Goal:◦ Extend classic workload management to polyglot persistence

◦ Leverage hetereogeneous (NoSQL) databases

Tenant specifiesrequirements as Service-Level-Agreements

Find or provision a suitable combinationof databases

Mediate data anddatabase operations

1. Requirements 2. Resolution 3. Mediation

Towards Automated Polyglot PersistenceNecessary steps

Page 7: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Functional Service Level Objectives◦ Guarantee a „feature“

◦ Determined by database system

◦ Examples: transactions, join

Non-Functional Service Level Objectives◦ Guarantee a certain quality of service (QoS)

◦ Determined by database system and service provider

◦ Examples:

Continuous: response time (latency), throughput

Binary: Elasticity, Read-your-writes

Service Level AgreementsExpressing application requirements

Page 8: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Utility expresses „value“ of a continuous non-functionalrequirement:

𝑓𝑢𝑡𝑖𝑙𝑖𝑡𝑦 𝑚𝑒𝑡𝑟𝑖𝑐 → [0,1]

Service Level AgreementsRefining the utility of each SLO

Page 9: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Functional Requirements

Scan-Querys

Conditional Updates

Transactions

Query by Example

Joins

Analytics

Elasticity

Consistency

Read-Latency

Write-Latency

Write-Throughput

Scalability of Data Volume

Read Scalability

Read-Availability

Write-Availability

Non-Functional Requirements

Durability

Write Scalability

SLA ExampleFor MongoDB

Page 10: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step I - RequirementsExpressing the application‘s needs

Requirements1

Database

Table

Field Field Field

1. Define schema

Tenant

Inherits continuous annotations

annotated

Table

Field

Tenant annotates schemawith his requirements

Annotations Continuous non-functional

e.g. write latency < 15ms Binary functional

e.g. Atomic updates Binary non-functional

e.g. Read-your-writes

2. Annotate

Page 11: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step I - RequirementsExpressing the application‘s needs

Requirements1

Database

Table

Field Field Field

1. Define schema

Tenant

Inherits continuous annotations

annotated

Table

Field

Tenant annotates schemawith his requirements

Annotations Continuous non-functional

e.g. write latency < 15ms Binary functional

e.g. Atomic updates Binary non-functional

e.g. Read-your-writes

2. Annotate

Page 12: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step II - ResolutionFinding the best database

The Provider resolves therequirements

RANK: scores availabledatabase systems

Routing Model: defines theoptimal mapping from schemaelements to databases

Resolution2

Provider

Capabilities for available DBs

1. Find optimal

RANK(schema_root, DBs)through recursive descent

using annotated schema and metrics

2a. If unsatisfiable

Either:Refuse orProvision new DB

2b. Generatesrouting model

Routing ModelRoute schema_element db transform db-independent to db-

specific operations

Page 13: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step II - ResolutionRanking algorithm by example

CustomersTable

ECommerceDBdatabase

ShoppingBasketList<String>

UserNameString

Lineariza-bility

Availability

Read latency

SchemaAnnotations

No annotationrecursive descent to child

RANK Algorithm

DBs = { MongoDB, Riak, Cassandra, CouchDB, Redis,

MySQL, S3, Hbase }

Page 14: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step II - ResolutionRanking algorithm by example

CustomersTable

ECommerceDBdatabase

ShoppingBasketList<String>

UserNameString

Lineariza-bility

Availability

Read latency

SchemaAnnotations

No annotationrecursive descent to child

RANK Algorithm

Binary requirement1. Exclude DBs that do not

support it2. Recursive descent

DBs = { MongoDB, Riak, Cassandra, CouchDB, Redis,

MySQL, S3, Hbase }

Page 15: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step II - ResolutionRanking algorithm by example

CustomersTable

ECommerceDBdatabase

ShoppingBasketList<String>

UserNameString

Lineariza-bility

Availability

Read latency

SchemaAnnotations RANK Algorithm

Continuous requirement∀ databases calculate

𝑑𝑏 → 𝑓𝑢𝑡𝑖𝑙𝑖𝑡𝑦(𝑑𝑏. 𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦)

Database Availability

MongoDB 99%0.8

Redis 95%0.05

MySQL 94% 0.04

HBase 99.9%0.9

Page 16: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step II - ResolutionRanking algorithm by example

CustomersTable

ECommerceDBdatabase

ShoppingBasketList<String>

UserNameString

Lineariza-bility

Availability

Read latency

SchemaAnnotations RANK Algorithm

Continuous requirement∀ databases calculate

𝑑𝑏 → 𝑓𝑢𝑡𝑖𝑙𝑖𝑡𝑦(𝑑𝑏. 𝑙𝑎𝑡𝑒𝑛𝑐𝑦)

Database Availability

MongoDB 99%0.8

Redis 95%0.05

MySQL 94% 0.04

HBase 99.9%0.9

Latency

10ms1

1ms1

40ms0.2

50ms0.1

Page 17: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step II - ResolutionRanking algorithm by example

CustomersTable

ECommerceDBdatabase

ShoppingBasketList<String>

UserNameString

Lineariza-bility

Availability

Read latency

SchemaAnnotations RANK Algorithm

Binary requirement1. Exclude DBs that do not

support it2. Recursive descent3. Pick DB with best total

score and add it torouting model

DB Score

MongoDB 0.9

Redis 0.525

MySQL 0.12

HBase 0.5

Page 18: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step II - ResolutionRanking algorithm by example

CustomersTable

ECommerceDBdatabase

ShoppingBasketList<String>

UserNameString

Lineariza-bility

Availability

Read latency

SchemaAnnotations RANK Algorithm

Binary requirement1. Exclude DBs that do not

support it2. Recursive descent3. Pick DB with best total

score and add it torouting model

DB Score

MongoDB 0.9

Redis 0.525

MySQL 0.12

HBase 0.5

Routing Model:Customers MongoDB

Page 19: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Step III - MediationRouting data and operations

The PPM routes data

Operation Rewriting: translates from abstract todatabase-specific operations

Runtime Metrics: Latency, availability, etc. are reportedto the resolver

Primary Database Option: All data periodically getsmaterialized to designateddatabase

Mediation3

Application

Polyglot Persistence Mediator Uses Routing Model Triggers periodic

materializationReportmetrics

1. CRUD, queries, transactions, etc.

db1 db2 db3

2. route

Page 20: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Evaluation: News ArticlePrototype built on ORESTES

Scenario: news articles with impression countsObjectives: low-latency top-k queries, high-throughput counts, article-queries

Article

Counter

Page 21: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Evaluation: News ArticlePrototype built on ORESTES

Scenario: news articles with impression countsObjectives: low-latency top-k queries, high-throughput counts, article-queries

Mediator

Page 22: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Evaluation: News ArticlePrototype built on ORESTES

Scenario: news articles with impression countsObjectives: low-latency top-k queries, high-throughput counts, article-queries

Mediator

Counter updates kill performance

Page 23: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Evaluation: News ArticlePrototype built on ORESTES

Scenario: news articles with impression countsObjectives: low-latency top-k queries, high-throughput counts, article-queries

Mediator

Page 24: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Evaluation: News ArticlePrototype built on ORESTES

Scenario: news articles with impression countsObjectives: low-latency top-k queries, high-throughput counts, article-queries

Mediator

No powerful queries

Page 25: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Evaluation: News ArticlePrototype built on ORESTES

Scenario: news articles with impression countsObjectives: low-latency top-k queries, high-throughput counts, article-queries

Article

IDTitle…

Imp.

Imp.ID

Document Sorted Set

Found Resolution

Page 26: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Workload Management: during mediationactively schedule requests based on requirements

Ranking: Predict future metrics from historicones (time-series analysis) or fromperformance models

Database selection: minimize𝑃 𝑆𝐿𝐴 𝑣𝑖𝑜𝑙𝑎𝑡𝑖𝑜𝑛 ∗ 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 (e.g. throughreinforcement learning)

Challgenges & Future Work

Page 27: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Meta-DBaaS: Mediate over DBaaS-systems and factor in their SLAs

Live Migration: Enable requirementchanges

Requirements: collect library of commonones

Utility: Provide intuitive, visual „knobs“ fordevelopers

Challgenges & Future Work

Page 28: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

(Manual) Polyglot Persistence is a reality - but difficultand error-prone

Polyglot Persistence Mediator: SLA-driven, fine-grainedselection of database systems

1. Let the tenant define his requirements

2. Choose or provision a database based on that

3. Route data and operations according to that mapping

Summary

Requirements Resolution Mediation

Page 29: Towards Automated Polyglot Persistence · 2020. 6. 24. · Polyglot Persistence Current best practice Application Layer Billing Data Nested Application Data Session data Search Index

Thank you.

[email protected]


Recommended