+ All Categories
Home > Documents > Mongodb Sharding & Map-Reduce - Universitas...

Mongodb Sharding & Map-Reduce - Universitas...

Date post: 12-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
14
Mongodb Sharding & Map-Reduce & Map-Reduce Wahyu Catur Wibowo
Transcript
Page 1: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Mongodb Sharding& Map-Reduce

Mongodb Sharding& Map-Reduce

Wahyu Catur Wibowo

Page 2: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

What is Sharding

Page 3: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Purpose

Database systems with large data sets and high throughputapplications can challenge the capacity of a single server. Highquery rates can exhaust the CPU capacity of the server. Largerdata sets exceed the storage capacity of a single machine.Finally, working set sizes larger than the system’s RAM stressthe I/O capacity of disk drives.

To address these issues of scales, database systems have twobasic approaches: vertical scaling and sharding.

Database systems with large data sets and high throughputapplications can challenge the capacity of a single server. Highquery rates can exhaust the CPU capacity of the server. Largerdata sets exceed the storage capacity of a single machine.Finally, working set sizes larger than the system’s RAM stressthe I/O capacity of disk drives.

To address these issues of scales, database systems have twobasic approaches: vertical scaling and sharding.

Page 4: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Purpose

Vertical scaling adds more CPU and storage resources toincrease capacity. Scaling by adding capacity has limitations:high performance systems with large numbers of CPUs andlarge amount of RAM are disproportionately moreexpensive than smaller systems. Additionally, cloud-basedproviders may only allow users to provision smaller instances.As a result there is a practical maximum capability for verticalscaling.

Sharding, or horizontal scaling, by contrast, divides the dataset and distributes the data over multiple servers, or shards.Each shard is an independent database, and collectively, theshards make up a single logical database.

Vertical scaling adds more CPU and storage resources toincrease capacity. Scaling by adding capacity has limitations:high performance systems with large numbers of CPUs andlarge amount of RAM are disproportionately moreexpensive than smaller systems. Additionally, cloud-basedproviders may only allow users to provision smaller instances.As a result there is a practical maximum capability for verticalscaling.

Sharding, or horizontal scaling, by contrast, divides the dataset and distributes the data over multiple servers, or shards.Each shard is an independent database, and collectively, theshards make up a single logical database.

Page 5: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Sharding

Page 6: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Sharding addresses the challenge of scaling to support highthroughput and large data sets:

Sharding reduces the number of operations each shardhandles. Each shard processes fewer operations as the clustergrows. As a result, a cluster can increase capacity andthroughputhorizontally.

For example, to insert data, the application only needs toaccess the shard responsible for that record.

Sharding reduces the amount of data that each server needsto store. Each shard stores less data as the cluster grows.

For example, if a database has a 1 terabyte data set, andthere are 4 shards, then each shard might hold only 256 GB ofdata. If there are 40 shards, then each shard might hold only25 GB of data.

Sharding addresses the challenge of scaling to support highthroughput and large data sets:

Sharding reduces the number of operations each shardhandles. Each shard processes fewer operations as the clustergrows. As a result, a cluster can increase capacity andthroughputhorizontally.

For example, to insert data, the application only needs toaccess the shard responsible for that record.

Sharding reduces the amount of data that each server needsto store. Each shard stores less data as the cluster grows.

For example, if a database has a 1 terabyte data set, andthere are 4 shards, then each shard might hold only 256 GB ofdata. If there are 40 shards, then each shard might hold only25 GB of data.

Page 7: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Sharding in MongoDB

Page 8: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Range-based Sharding

Page 9: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Hash-based Sharding

Part of Range-based Sharding

Hash key value = MD5(key value)

Page 10: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and
Page 11: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Tag-aware Sharding

Page 12: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Applying Sharding

Page 13: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Map-Reduce

Map-reduce is a data processing paradigm for condensing largevolumes of data into useful aggregatedresults. For map-reduce operations, MongoDB providesthe mapReduce database command.

Map-reduce is a data processing paradigm for condensing largevolumes of data into useful aggregatedresults. For map-reduce operations, MongoDB providesthe mapReduce database command.

Page 14: Mongodb Sharding & Map-Reduce - Universitas Indonesiawcw.cs.ui.ac.id/teaching/imgs/bahan/tbdl/sharding.pdf · Sharding, or horizontal scaling, by contrast, divides the data set and

Recommended