Date post: | 07-Nov-2014 |
Category: |
Technology |
Upload: | rakuten-inc |
View: | 2,468 times |
Download: | 3 times |
Open the New Door
Yosuke Hara Oct 26, 2013 (rev 2.2)
The Lion of Storage Systems
1
Started OSS-project on July 4, 2012www.leofs.org
LeoFS is "Unstructured Big Data Storage for the Web"and a highly available, distributed, eventually consistentstorage system.
Organizations can use LeoFS to store lots of dataefficently, safely and inexpensively.
2
Motivation
3
1. High Costs (Initial Costs, Running Costs)2. Possibility of "SPOF"3. NOT Easily Scale
Storage Expansion is difficult during periods of increasing data
Expensive Storage Problems:
Motivation
?Get Away From Using "Expensive H/W Based Storages"
As of 2010
4
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �
REST-API / AWS S3-API
The Lion of Storage Systems
HIGH Availability
HIGH Cost Performance Ratio
HIGH Scalability
LeoFS Non Stop
Velocity: Low LatencyMinimum Resources
Volume: Petabyte / ExabyteVariety: Photo, Movie, Unstructured-data
3 Vs in 3 HIGHs
5
Overview
6
metadata Object Store
Storage Engine/Router
metadata Object Store
Storage Engine/Router
LeoFS-Manager
REST over HTTP (80/443) RPC
(4369)
Request fromWeb Applications/ Browsers
w/REST-API / S3-API
metadata Object Store
Storage Engine/Router
Load Balancer
Monitor
GUI Console
(4000,4010,4020)
(10020, 10021)
RPC (4369)
No Master No SPOF
LeoFS Overview
LeoFS-Gateway
LeoFS-Storage
7
Gateway
8
LeoFS Overview - Gateway
Stateless Proxy + Object Cache
REST-API / S3-API
Use Consistent Hashingfor decision of a primary node
[ Memory Cache, Disc Cache ]
Storage C
lusterG
ateway(s)
Clients
Handle HTTP Request and ResponseBuilt in "Object Cache Mechanism"
Storage Cluster
Choosing Replica Target Node(s)
RING2 ^ 128 (MD5)
# of replicas = 3
KEY = “bucket/leofs.key”Hash = md5(Filename)
Secondary-1
Secondary-2
Primary Node
9
Storage
10
Storage (S
torage Cluster)
Gatew
ay
Automatically Replicatean Object and a Metadata to Remote Node(s)
LeoFS Overview - Storage
Use "Consistent Hashing"for Replication
in the Storage Cluster
Choosing Replica Target Node(s)
RING2 ^ 128 (MD5)
# of replicas = 3
KEY = “bucket/leofs.key”Hash = md5(Filename)
Secondary-1
Secondary-2
Primary Node
"P2P"
11
Request From Gateway
LeoFS Overview - Storage
...
LeoFS Storage
Metadata : Keeps an in-memory index of all dataObject Container : Manages "Log Structured File"
ReplicatorRepairer w/Queue
...
Storage Engine Workers
Storage E
ngine, Metadata + O
bject Container
Gatew
ay
Storage Engine consits of "Object Storage" and "Metadata Storage"Built in "Replicator", "Recoverer" w/Queue for the Eventual Consistency
12
LeoFS Storage Engine - Retrieve an object from the storage
< META DATA >IDFilenameOffsetSizeChecksum
Header
File
Footer
< META DATA >IdFilenameOffset, SizeChecksum (MD5)Version#
Storage Engine Worker
Object Container Metadata Storage
Storage Engine Worker
13
LeoFS Storage Engine - Retrieve an object from the storage
< META DATA >IDFilenameOffsetSizeChecksum
Header
File
Footer
< META DATA >IdFilenameOffset, SizeChecksum (MD5)Version#
Object Container Metadata Storage
Storage Engine Worker
Insert a metadata
Append an objectinto the object container
Storage Engine Worker
14
LeoFS Storage Engine - Remove unnecessary objects from the storage
Compact
Old Object Container/Metadata
Storage Engine Worker
New Object Container/Metadata
Storage Engine Worker
15
Offset Version Time-stamp{VNodeId, Key}
<Metadata>
Checksum
for Sync
KeySize CustomMeta Size File Size
for Retrieve an File (Object)
Footer (8B)
Checksum KeySize DataSize Offset Version Time-stamp
{VNodeId,Key} User-Meta Footer
Header (Metadata - Fixed length) Body (Variable Length)
User-MetaSize
ActualFile
<Needle>
Supe
r-bl
ock
Nee
dle-
1
Nee
dle-
2
Nee
dle-
3
<Object Container>
Nee
dle-
4
Nee
dle-
5
LeoFS Overview - Storage - Data Structure/Relationship an object
16
To Equalize Disk Usage of Every Storage NodeTo Realize High I/O efficiency and High Availability
LeoFS Overview - Storage - Large Object Support
chunk-0
chunk-1
chunk-2
chunk-3
An Original Object’s Metadata
Original Object NameOriginal Object Size# of Chunks
Storage ClusterGatewayClient(s)
[ WRITE Operation ]
Chunked Objects
Every chunked object and metadata are replicated
in the cluster
17
Manager
18
Storage Cluster
LeoFS Overview - Manager
Monitor
Operate
RING, Node State
status, suspend,resume, detach, whereis, ...
Gateway(s)
Storage C
lusterG
ateway(s)
Manager(s)
Operate LeoFS - Gateway and Storage Cluster"RING Monitor" and "NodeState Monitor"
19
New Features
20
"Insight"
21
Give Insight into the State of LeoFS 1. To control requests from Clients to LeoFS2. To check and see "Traffic info" and "State of Every Node"
for Keeping Availability
New Features - LeoInsight (v1.0)
22
Storage Cluster
ManagerGateway
The Lion of Storage Systems
TimeSeriesDB (Savannah)
Persistent calculated statistics-data
REST-API (JSON)
Operate LeoFS
Notifier
Distributed Queue (ElkDB)
Traffic-Info from Gateway Consume MSG
Retrieve
Proves of a Node from Gateway/Storage/Manager
Notify
New Features - LeoInsight (v1.0)
23
More Scalability&
More Availability
24
TokyoEurope
US
New Features - Multi Data Center Data Replication (v1.0)
HIGH-ScalabilityHIGH-Availability
Easy Operation for Admins+
NO SPOFNO Performance Degration
Singapore
25
DC-3DC-2
v1.0 - Multi Data Center Data Replication
Storage cluster
Manager cluster
Client
DC-1
Monitor and Replicate each “RING” and “System Configuration”
"Leo Storage Platform"
[replicas:1] [replicas:1]
Request tothe Target Region
Application(s)
[ 3 Regions & 5 Replicas ]Method of MDC-Replication:Async: Bulked TransferSync+Tran: Consensus Algorithm
DC-1 Configuration:- Method of Replication:- Consistency Level: - local-quorum:[N=3, W=2, R=1, D=2] - # of target DC(s):2 - # of replicas a DC:1 >> Total of Replicas: 5
[replicas:3]
26
1) 3 replicas are written in "Local Region"
DC-3DC-2
v1.0 - Multi Data Center Data Replication
Storage cluster
Manager cluster
Client
DC-1
Monitor and Replicate each “RING” and “System Configuration”
"Leo Storage Platform"
[replicas:1] [replicas:1]
Request tothe Target Region
Application(s)
[ 3 Regions & 5 Replicas ]Method of MDC-Replication:Async: Bulked TransferSync+Tran: Consensus Algorithm
DC-1 Configuration:- Method of Replication:- Consistency Level: - local-quorum:[N=3, W=2, R=1, D=2] - # of target DC(s):2 - # of replicas a DC:1 >> Total of Replicas: 5
[replicas:3]
27
2) Sync (or Async) Rplicaion to Other Region(s)
DC-3DC-2
v1.0 - Multi Data Center Data Replication
Storage cluster
Manager cluster
Client
DC-1
Monitor and Replicate each “RING” and “System Configuration”
"Leo Storage Platform"
Request tothe Target Region
Application(s)
[ 3 Regions & 5 Replicas ]
[replicas:3]
Leader Follower
DC1.node_0 - PrimaryDC1.node_1DC1.node_2DC2.node_3DC3.node_4
Local-follower
Remote-follower
[replicas:1] [replicas:1]
28
v1.0 - Multi Data Center Data Replication
Storage cluster
Manager cluster
Client
"Leo Storage Platform"
DC-3
US
DC-2
Singapore
DC-1
Tokyo
Monitor and Replicate each “RING” and “System Configuration”
[replicas:3] [replicas:1] [replicas:1]
DC-4
Europe
Request tothe Target Region
Application(s)
[ 3 Regions & 5 Replicas ]
3) Replication for Geographical Optimization
Local Region Remote-1 Remote-2Tokyo Singapore US
Singapore Tokyo Europe
Europe US Singapore
US Europe Tokyo
29
"Center"
30
Web-based administrative console for inspecting and manipulatingLeoFS Storage Clusters and LeoFS Gateway
Operate LeoFS
New Features - LeoCenter
Admin Tools
Access Log Analysis
31
Access Log Analysis (β)
32
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �
REST-API / AWS S3-API
The Lion of Storage Systems
HIGH Availability
HIGH Cost Performance Ratio
HIGH Scalability
LeoFS Non Stop
Velocity: Low LatencyMinimum Resources
Volume: Petabyte / ExabyteVariety: Photo, Movie, Unstructured-data
3 Vs in 3 HIGHs
33
Set Sail for “Cloud Storage”Website: www.leofs.orgTwitter: @LeoFastStorageFacebook: www.facebook.com/org.leofs
34