Post on 07-Jul-2015
transcript
Process Design and Polymorphism: Lessons Learnt from Development of Kai
takemaru
08.11.20 1
Outline Reviewing Dynamo and Kai
Kai: an Open Source Implementation of Amazon’s Dynamo Features and Mechanism
Process Design for Better Performance Based on Calling Sequence and Process State
Polymorphism in Actor Model Two approaches to implement polymorphism
08.11.20 2
Dynamo: Features
3
Key, value store Distributed hash table
High scalability No master, peer-to-peer Large scale cluster, maybe O(1K)
Fault tolerant Even if an entire data center fails Meets latency requirements in the case
08.11.20
get(“cart/joe”)
joe
joe
joe
Dynamo: Partitioning
08.11.20 4
Consistent Hashing Nodes and keys are
positioned at their hash values MD5 (128bits)
Keys are stored in the following N nodes
joe
Hash ring (N=3)
2^128 0 1 2
hash(node3)
hash(node1)
hash(node2)
hash(node5)
hash(node4)
hash(cart/joe)
hash(node6)
Dynamo: Membership
08.11.20 5
Gossip-based protocol Spreads membership like a
rumor Membership contains node list
and change history Exchanges membership with a
node at random every second Updates membership if more
recent one received
Advantages Robust; no one can prevent a
rumor from spreading Exponentially rapid spread
Membership change is spread by the form of gossip
Detecting node failure
hash(node2)
node2 is down
node2 is down
node2 is down
node2 is down
Dynamo: get/put Operations
08.11.20 6
Client 1. Sends a request any of
Dynamo node The request is forwarded to
coordinator Coordinator: one of nodes
associated with the key Coordinator
1. Chooses N nodes by using consistent hashing
2. Forwards a request to N nodes
3. Waits responses from R or W nodes, or timeouts
4. Checks replica versions if get 5. Sends a response to client
get/put operations for N,R,W = 3,2,2
Request Response
Client
Coordinator
joe
joe
joe
Kai: Overview
08.11.20 7
Kai Open source implementation of Amazon’s Dynamo
Named after my origin OpenDynamo had been taken by a project not related to Amazon’s
Dynamo
Written in Erlang memcache API Found at http://sourceforge.net/projects/kai/
Process Design for Better Performance
08.11.20 8
Two approaches to improve software performance
08.11.20 9
To process it with less CPU resource Solved by introducing better algorithms e.g. Binary search is used instead of linear search
To use all CPU resources Issues identified in multi-core environment Solved by rearranging process-core mapping e.g. Heavy process and light process run on multi-core
1st core
2nd core boring…
heavy process light process
Two approaches to improve software performance
08.11.20 10
To process it with less CPU resource Solved by introducing better algorithms e.g. Binary search is used instead of linear search
To use all CPU resources Issues identified in multi-core environment Solved by rearranging process-core mapping e.g. Heavy process and light process run on multi-core
1st core
2nd core work! work!
heavy process light process
Latter case will be discussed
Diagram Convention
08.11.20 11
Procedural programming Procedures are called by a process
foo bar
process foo
-module(foo). -behaviour(gen_server).
ok(State) -> {reply, bar:ok(), State}.
-module(bar).
ok() -> ok.
process ellipse
rectangle not process
Diagram Convention, cont’d
08.11.20 12
Actor model 2 processes interact with each other
foo bar
-module(bar). -behaviour(gen_server).
ok(State) -> {reply, ok, State}. ok() -> gen_server:call(?MODULE, ok).
process bar
process foo
process ellipse
rectangle not process
-module(foo). -behaviour(gen_server).
ok(State) -> {reply, bar:ok(), State}.
Design Rules in Erlang
08.11.20 13
Some of rules on process design: “Assign exactly one parallel process to each true concurrent activity
in the system” “Each process should only have one role”
from Program Development Using Erlang - Programming Rules and Conventions
Processes in getting/putting data
08.11.20 14
Node 1
coordinator
Node 2
Client memcache rpc rpc
For clarity, details of architecture are omitted.
store
data
coordinator
rpc
Choosing a node having responsibility for requested key
Node 2 Node 1
Sequence in getting/putting data
08.11.20 15
memcache coordinator rpc coordinator store Client
Blocked…
Node 2 Node 1
Sequence in getting/putting data, cont’d
08.11.20 16
memcache coordinator rpc coordinator store Client
Another client
Blocked…
Node 2 Node 1
Sequence in getting/putting data, cont’d
08.11.20 17
memcache coordinator rpc coordinator store Client
Another client Accepted and…
Blocked!
Design Rules in Erlang, again
08.11.20 18
Another rule on process design: “Use many processes”
Many processes are almost uniformly assigned to each processor by statistical effect
from Chap.20 Programming Erlang
Processes in getting/putting data, again
08.11.20 19
Node 1
coordinator
Node 2
Client rpc
store
data
coordinator
rpc memcache memcache rpc rpc
coordinator coordinator
For clarity, details of architecture are omitted.
Spawning multiple coordinator processes
Node 2 Node 1
Sequence in getting/putting data, again
08.11.20 20
memcache coordinator rpc coordinator store Client
Another client Accepted and…
Node 2 Node 1
Sequence in getting/putting data, again
08.11.20 21
memcache coordinator rpc coordinator store Client
Another client Accepted and…
Spawned
It’s OK, but looks like overkill.
Concurrency is produced by memcache module. Why it is reproduced by coordinator?
Design Rules in Erlang, in final
08.11.20 22
Another rule on process design: “Don’t spawn stateless processes”
Called as procedures from concurrent processes Introduced by me
Processes in getting/putting data, in final
08.11.20 23
Node 1 Node 2
Client rpc
store
data
rpc memcache memcache rpc rpc
For clarity, details of architecture are omitted.
Not spawned
coordinator coordinator
Node 2 Node 1
Sequence in getting/putting data, in final
08.11.20 24
memcache/coordinator rpc coordinator store Client
Another client Accepted and processed
Lessons Learnt
08.11.20 25
Process design based on calling sequence and process state Externally called module
Must be spawned to produce concurrency Runs as multiple processes if needed e.g. TCP listening process, timer process
Stateless module No need to be spawned e.g. coordinator of Kai
Stateful module Should be spawned for state consistency Runs as multiple processes if possible
If a single process, it must be a terminal one (never call other processes synchrnously) to avoid blocking
e.g. database process, socket pool
Process Relationship in Kai
08.11.20 26
memcache
store
data
memcache rpc rpc
hash
members, buckets
connection version
coordinator
sockets
process ellipse
rectangle not process
state
current version
vclock config
configs
Relationships between stateful processes and other modules are omitted.
Process Relationship in Kai, cont’d
08.11.20 27
store
data
rpc rpc
hash
members, buckets
connection version
sockets
process ellipse
rectangle not process
membership
sync
with timer
vclock
current version
config configs
Relationships between stateful processes and other modules are omitted.
Process Relationship in Kai, cont’d
08.11.20 28
“Lessons learnt” are almost satisfied Externally called modules are spawned
As multiple processes if needed e.g. kai_rpc, kai_memcache, kai_sync, kai_membership
Stateless modules are not spawned e.g. kai_coordinator
Stateful modules are spawned e.g. kai_config, kai_version, kai_store, kai_hash, kai_connection However, some of them are NOT terminal ones
e.g. connection module calling config process, is potential bottle neck
“Lessons learnt” can point out potential bottle necks Yes, connection module is just a thing!
Advanced Issues
08.11.20 29
More concurrency may be introduced if needed Design rules from “Lessons Learnt” can be applied locally e.g. coordinator produces N concurrency for asynchronous
calls
Referred to Web application servers in MVC model Web application servers Process Design from
“Lessons Learnt”
Concurrency is produced by
Web servers, e.g. Apache Externally called modules
Application is controlled by
Controller of MVC Stateless modules
State is managed by Model of MVC Stateful modules
Advanced Issues, cont’d
08.11.20 30
Is blocking never occurred in pure functional programming? In Kai, a process receiving requests waits for data to be replied In pure functional programming, another process handles data
to be replied? Not straightforward for me…
Blocking
Req. Req. and socket
Res. Res. and socket
Kai Pure functional programming model?
Polymorphism in Actor Model
08.11.20 31
What’s Polymorphism?
08.11.20 32
“a programming language feature that allows values of different data types to be handled using a uniform interface” from Wikipedia
Polymorphism in Java
08.11.20 33
Interface
Implementation class
interface Animal { void bark(); }
class Dog implements Animal { public void bark() { System.out.println(“Bow wow”); } }
Including no implementation
Polymorphism in Java, cont’d
08.11.20 34
Abstract class
Concrete class
abstract class Animal { public void bark() { System.out.println(this.yap()); } abstract String yap(); }
class Dog extends Animal { String yap() { return “Bow wow”; } }
Including some implementation
Polymorphism Inspired by Interface
08.11.20 35
Interface module Initializes implementation module with a name Calls the process with the name
Implementation module Spawns a process and registers it as the given name Implements actual logics, which are called from interface
Polymorphism Inspired by Interface, cont’d
08.11.20 36
-module(animal).
start_link(Mod) -> Mod:start_link(?MODULE).
bark() -> gen_server:call(?MODULE, bark).
-module(dog). -behaviour(gen_server).
start_link(ServerName) -> gen_server:start_link({local, ServerName}, ?MODULE, [], []).
handle_call(bark, _From, State) -> bark(State).
bark(State) -> io:format(“Bow wow~n”), {reply, ok, State}.
Some required callbacks are omitted.
Polymorphism Inspired by Interface, cont’d
08.11.20 37
How to use
animal:start_link(dog), animal:bark(). % Bow wow
Polymorphism Inspired by Interface, cont’d
08.11.20 38
Example in Kai Provides two types of local storage with a single interface Interface module
kai_store
Implementation modules kai_store_ets, uses ets, memory storage kai_store_dets, uses dets, disk storage
See actual codes in detail
Polymorphism Inspired by Abstract Class
08.11.20 39
Abstract module Defines abstract functions by using behavior mechanism Spawns a process and stores a name of concrete module Implements base logics Calls the process
Concrete module Implements callbacks, which are called from the abstract
module
Polymorphism Inspired by Abstract Class, cont’d
08.11.20 40
-module(animal). -behaviour(gen_sever).
behaviour_info(callbacks) -> [{yap, 0}]; % abstract void yap(); behaviour_info(_Other) -> undefined.
start_link(Mod) -> gen_server:start_link({local, ?MODULE}, ?MODULE, [Mod], []).
init(_Args = [Mod]) -> {ok, _State = {Mod}}.
bark(_State = {Mod}) -> io:format("~s~n", [Mod:yap()]), {reply, ok, Mod}.
handle_call(bark, _From, State) -> bark(State).
bark() -> gen_server:call(?MODULE, bark).
Some required callbacks are omitted.
Polymorphism Inspired by Abstract Class, cont’d
08.11.20 41
How to use Same as an example of interface
-module(dog). -behaviour(animal).
yap() -> "Bow wow”.
animal:start_link(dog), animal:bark(). % Bow wow
Polymorphism Inspired by Abstract Class, cont’d
08.11.20 42
Example in Kai Provides two types of TCP listening processes with a single
interface Abstract module (behavior)
kai_tcp_server
Concrete modules kai_rpc, listens RPC calls from other Kai nodes kai_memcache, listens requests from memcache clients
See actual codes in detail
Lessons Learnt
08.11.20 43
Two approaches to implement polymorphism Inspired by interface
Simple Not efficient
Actual logics have to be implemented in each child
Inspired by abstract class Erlang-way Efficient
Abstract class can be shared by children
Summary
08.11.20 44
Outline Reviewing Dynamo and Kai
Kai: an Open Source Implementation of Amazon’s Dynamo Features and Mechanism
Process Design for Better Performance Process Design Based on Calling Sequence and Process State
Polymorphism in Actor Model Two approaches to implement polymorphism
08.11.20 45
Kai: Roadmap
08.11.20 46
1. Initial implementation (May, 2008) 1,000 L
2. Current status 2,200 L Following tickets done
Module Task
kai_coordinator Requests from clients will be routed to coordinators
kai_version Vector clocks
kai_store Persistent storage
kai_rpc, kai_memcache Process pool
kai_connection Connection pool