Thesis finalpresentation

Post on 28-Nov-2014

1,815 views 1 download

description

In recent years the need for distributed data storage has led the way to design new systems in a large-scale environment. The growth of unbounded stream of data, the necessity to store and analyze it in real time, reliably, scalably and fast are the reasons for appearance of such systems in financial sector, stock exchange Nasdaq OMX especially. Futhermore, internally designed totally ordered reliable message bus is used in Nasdaq OMX for almost all internal subsystems. Theoretical and practical extensive studies on reliable totally ordered multicast were made in academia and it was proven to serve as a fundamental block in construction of distributed fault-tolerant applications. In this work, we are leveraging NOMX low-latency reliable totally ordered message bus with a capacity of at least 2 million messages per second to build high performance distributed data store. The data operations consistency can be easily achieved by using the messaging bus as it forwards all messages in reliable total order fashion. Moreover, relying on the reliable totally ordered messaging, active in-memory replication support for fault tolerance and load balancing is integrated. Consequently, the prototype was developed using pro- duction environment requirements to demonstrate its feasibility. Experimental results show a great scalability, and performace serving around 400,000 insert operations per second over 6 data nodes that can be served with 100 microseconds latency. Latency for single record read operations are bound to sub-half millisecond, while data ranges are retrieved with sub-100 Mbps ca- pacity from one node. Moreover, performance improvements under a greater number of data store nodes are shown for both writes and reads. It is con- cluded that uniform totally ordered sequenced input data can be used in real time for large-scale distributed data storage to maintain strong consistency, fault-tolernace and high performance.

transcript

GDS: Genium Data Store Real Time, Low Latency, Reliable!

Iuliia Proskurnia!EMDC!KTH!2013!

2!

3!

4!

3900 companies! 39 countries! over 1500 corporate products!

USE CASE!

Write events!Retrieve ranges of records!

5!

Fault-Tolerant?!

Consistent?!

Fast?!

Scalable?!

6!

Approaches

!   Consensus based!

!   ...!!   Total Order Multicast!

!   Symmetric!

!   Token Site !

Uniform Reliable Total Order

◦  Validity !

◦  Uniform Integrity !

◦  Uniform Agreement !

◦  Uniform Total Order !

!

7!

8!

Genium INET Message Bus �Uniform Reliable Total Order Multicast

!   Similar to Amoeba protocol!

!  However... Fault Tolerant!!

9!

GDS: Genium Data Store

!   Uses Genium INET Message Bus abstraction!

!   Clients, Sequencer, Data store!

! Rewinders and sequencer replication!

!   Active replication!

Client!

Data store node!

Data store node!

10!

GDS high level abstraction

LEDS!

11!

LEDS

!   Column based!

!   BLOBS!

!   Appends!

!   Range Queries!

!  Not Distributed!

!  Not fault-tolerant!

12!

Properties

!   Consistent!

!   Failure Resilient!!   Replication!

!   Rewinders!

!   Cite Replication!

Total Order!

13!

Possible Failure Scenarios C

lient

Fai

lure!

Sequ

ence

r Fai

lure!

���8VHU

6HT

'6

���

>Q@

>Q@

:RUN

>Q��@

&XUUHQW�0HVVDJH��>Q��@ 5HZLQG�0HVVDJH��>Q@

���8VHU

6HT

'6

��� 5HZLQG�0HVVDJH��>Q@

6HT6WDQG%\

>Q@

>Q@

>Q@

>Q��@

3ULPDU\�3�

���

3

>Q@

>Q@

>Q@

5HZLQG�0HVVDJH��>Q@

14!

Scalability

!  Natural Load Balancing!

!   Partitioning (manual)!

15!

Evaluation

!   Inserts (throughput/latency)!

!   Range Queries (throughput)!

!   Range transmission failure!

16!

Set Up

17!

Writes �Throughput

18!

Writes Limits

19!

Writes �Latency

20!

Range Queries �Throughput

21!

Range Queries �Scalability

8 Concurrent Users!

22!

Range Queries �Link Failure

23!

Summary

!   uniform reliable total order multicast !

!   scales fine!

!   low latency!

!   consistent, fault-tolerant!

24!

Future Work

!   Generality!

!   Send compressed chunks!

!   Automated partitioning!

!   Long-running tests!

25!

Comments? Questions?

Thesis Writing Process!

26!

Single record read�without load

27!

Single record read�with load (10 000 inserts)

28!

Single record read�scalability

29!

Discussion