Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | edwin-goodman |
View: | 217 times |
Download: | 0 times |
Distributed Systems (part 1)
Chris [email protected]
Department of Computer Science and EngineeringWashington University, St. Louis, MO, USA
CSE 591 Area 5 Talk
Monday, November 10, 2008
2 - Gill: Distributed Systems – 04/20/23
What is a Distributed System?
"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.”
- Leslie Lamport
(BTW, this is entirely “ha, ha, only serious” ;-)
3 - Gill: Distributed Systems – 04/20/23
Key Characteristics of a Distributed System
Programs on different computers must interact» A distributed system spans multiple computers» Programs must send information to each other» Programs must receive information from each other» Programs also need to do some work ;-)
Programs play different roles in those interactions» Send a request (client), process the request (server),
send a reply (server), receive and process reply (client)
» Remember where to find things (directory, etc. services)
» Mediate interactions among distributed programs (coordination, orchestration, etc. services)
Programs can interact in many other ways as well » Coordination “tuple spaces” (JavaSpaces, Linda,
LIME) » Publish-subscribe and message passing middleware» Externally driven (e.g., a workflow management
system)
4 - Gill: Distributed Systems – 04/20/23
Distribution Semantics Matters a Lot
How are the different computers inter-connected?» Does all traffic move on a common data bus? » Or, does traffic move across (hierarchical) networks?» Or, does traffic move point-to-point between hosts?
Are there spatial and/or temporal factors?» Does hosts physical location/movement matter?» Is delay noticeable, are bandwidth limits relevant? » Are connections “always on” or can they be
intermittent?» Does the inter-connection topology change?» Is the inter-connection topology entirely dynamic?
Programs can interact in many other ways as well » Coordination “tuple spaces” (JavaSpaces, Linda,
LIME) » Publish-subscribe and message passing middleware» Externally driven (e.g., a workflow management
system)
5 - Gill: Distributed Systems – 04/20/23
Distribution Semantics Examples (1/3)
Wired (hierarchical) internet» Can reach any host from any other host» Hosts are “always” on and available (% failure,
downtime)» Much of the WWW depends on this notion
(example?) J
I
H
G
F
A
B
C
D
E
6 - Gill: Distributed Systems – 04/20/23
Distribution Semantics Examples (2/3)
Nomadic (hierarchical) internet» Some hosts are mobile, connect to nearest access
point» Hosts may be unavailable, but reconnect eventually
» Host-to-host path topology may change due to this
» Cell phones, wireless laptops exhibit this behaviorJ
I
H
G
F
A
B
C
D
EC
7 - Gill: Distributed Systems – 04/20/23
Distribution Semantics Examples (3/3)
Mobile ad hoc networks (MANETS)» Mobile hosts connect to each other (w/out access
point)» Hosts may detect dynamic connection, disconnection» Hosts must exploit communication windows of
opportunity» Enables ad-hoc routing, message “mule”
behaviorsJ I
HG
F
AB
C
DE
H
8 - Gill: Distributed Systems – 04/20/23
Distributed System Example (Wired)
Real-time avionics middleware » Layer(s) between the application and the operating
system» Ensures non-critical activities don’t interfere with
timing of critical ones» Based on other open-source middleware projects
» ACE C++ library and TAO object request broker» Standards-based (CORBA), written in C++/Ada
Flight demonstrations : BBN, WUSTL, Boeing,
Honeywell
laxity
static
static
timers
laxity
static
static
timers
9 - Gill: Distributed Systems – 04/20/23
Distributed System Example (Nomadic/MANET)
Sliver» A compact (small footprint) workflow engine for
personal computing devices (e.g., cell phones, PDAs)» Allows mobile collaboration to assemble and
complete automated work-flows (task graphs)» Standards-based (BPEL, SOAP), written in Java
Developed by Greg Hackmann at WUSTL
10 - Gill: Distributed Systems – 04/20/23
How do Distributed Systems Interact?
Remote method invocations are one popular style» Allows method calls to be made between programs» Middleware uses threads, sockets, etc. to make it so» CORBA, Java RMI, SOAP, etc. standardize the details
Other styles (better for nomadic/mobile settings)» Coordination “tuple spaces” (JavaSpaces, Linda,
LIME) » Publish-subscribe and message passing middleware» Externally driven (e.g., a workflow management
system)
11 - Gill: Distributed Systems – 04/20/23
Challenges for (Wired) Distributed Systems Distributed systems are inherently complex
» Remote concurrent programs must inter-operate» Interactions must be assured of liveness and safety
Also must avoid accidental complexity» Design for ease of configuration, avoidance of
mistakes» System architectures and design patterns can help
map low level abstractions into appropriate higher level ones
Clients Servants
nORB
DispatcherDispatcherDispatcher
SOASOASkeletonSkeleton SkeletonSkeleton
StubStubStubStub
12 - Gill: Distributed Systems – 04/20/23
How to Abstract Concurrent Event Handling?
Server
CONNECT
Client1
Port:27098
Client2
Port:26545
CONNECT
Goal: process multiple service
requests concurrently
using OS level threads
Port:30000 listen
Port:24467 accept
Port:25667 accept
13 - Gill: Distributed Systems – 04/20/23
Basis: Synchronous vs. Reactive Read
read()
Clients Server
select()
Clients Server
read()
data data
HandleSetHandleSet
14 - Gill: Distributed Systems – 04/20/23
Approach: Reactive Serial Event Dispatching
select()Clients
Application
Event Handlers
read()
read()
Reactor
handle_*()
HandleSet
15 - Gill: Distributed Systems – 04/20/23
Interactions among Participants
Main ProgramConcrete Event
HandlerReactor
SynchronousEvent
Demultiplexer
register_handler(handler, event_types)
get_handle()
handle_events()select()
event
handle_event()
16 - Gill: Distributed Systems – 04/20/23
Distributed Interactions with Reactive Hosts Application components implemented as handlers
»Use reactor threads to run input and output methods»Send requests to other handlers via sockets, upcalls
Example of a multi-host request/result chain»h1 to h2, h2 to h3, h3 to h4
reactor r1
handler h1
reactor r2 reactor r3
socket socket
handler h2 handler h4 handler h3
17 - Gill: Distributed Systems – 04/20/23
WaitOnConnection Strategy
Client ServerC
Reactor
3 wait
Re
acto
r
Servant
Deadlock here
Callback
1 2 4
5
• Handler waits on socket connection for the reply– Makes a blocking call to
socket’s recv() method
• Benefits– No interference from other
requests that arrive while the reply is pending
• Drawbacks– One less thread in the
Reactor for new requests– Could allow deadlocks when
upcalls are nested
18 - Gill: Distributed Systems – 04/20/23
WaitOnReactor Strategy
• Handler returns control to reactor until reply comes back – Reactor can keep processing
other requests while replies are pending
• Benefits– Thread available, no
deadlock– Thread stays fully occupied
• Drawbacks– Interleaving of request reply
processing– Interference from other
requests issued while reply is pending
Client ServerC
Reactorwait
Re
acto
r
Servant
Callback6
Deadlock avoided bywaiting on reactor
1
342
5
19 - Gill: Distributed Systems – 04/20/23
Blocking with WaitOnReactor• Wait-on-Reactor
strategy could cause interleaved request/reply processing
• Blocking factor could be large or even unbounded – Based on the upcall
duration– And sequence of other
intervening upcalls
• Blocking factors may affect real-time properties of other end-systems– Call-chains can have
a cascading blocking effect
f2f5
f3
f5 replyqueued
f3 completes
f5 replyprocessed
f2 completes
Blocking factor for f2
20 - Gill: Distributed Systems – 04/20/23
Why not a “Stackless” WaitOnReactor Variant?
• What if we didn’t “stack” processing of results?– But instead allowed them to handled
asynchronously as they are ready– “Stackless Python” takes this approach
– Thanks to Caleb Hines who pointed this out in CSE 532
• Benefits– No interference from other requests that arrive when
reply is pending– No risk of deadlock as thread still returns to reactor
• Drawbacks– Significant increase in implementation complexity– Time and space overhead to match requests to
results (other patterns we cover in CSE 532 could help, though)
21 - Gill: Distributed Systems – 04/20/23
Could WaitOnConnection Be Used?
Main limitation is its potential for deadlock»And, it offers low overhead, ease of implementation/use
Could we make a system deadlock-free …» if we knew its call-graph … and were careful about how
threads were allowed to proceed?
Notice that a lot of distributed systems research has this kind of flavor…»Given one approach (of probably several alternatives)
Can we solve problem X that limits its applicability and/or utility? Can we apply that solution efficiently in practice? Does the solution raise other problems that need to be solved?
22 - Gill: Distributed Systems – 04/20/23
Call graph often can be obtained
Each reactor is assigned a color
Deadlock can exist » If there exists > Kc
segments of color C»Where Kc is the
number of threads in node with color C
»E.g., f3-f2-f4-f5-f2 needs at least 2 & 1
Deadlock Problem in Terms of a Call Graph
f1f2
f3
f4
f5
From V. Subramonian and C. Gill, “A Generative Programming Framework for Adaptive Middleware”, 2004
23 - Gill: Distributed Systems – 04/20/23
Simulation Showing Thread Exhaustion
Reactor1
Client1
Client2
Client3
Reactor2
Formally, increasing number of reactor threads may not prevent deadlock
Server1 Server2
Flow1
Flow2
Flow3
EH11
EH31
EH21
EH12
EH13
EH22
EH23
EH32
EH33
Clients send requests 3: Client3 : TRACE_SAP_Buffer_Write(13,10) 4: Unidir_IPC_13_14 : TRACE_SAP_Buffer_Transfer(13,14,10) 5: Client2 : TRACE_SAP_Buffer_Write(7,10) 6: Unidir_IPC_7_8 : TRACE_SAP_Buffer_Transfer(7,8,10) 7: Client1 : TRACE_SAP_Buffer_Write(1,10) 8: Unidir_IPC_1_2 : TRACE_SAP_Buffer_Transfer(1,2,10)Reactor1 makes upcalls to event handlers 10: Reactor1_TPRHE1 ---handle_input(2,1)---> Flow1_EH1 12: Reactor1_TPRHE2 ---handle_input(8,2)---> Flow2_EH1 14: Reactor1_TPRHE3 ---handle_input(14,3)---> Flow3_EH1Flow1 proceeds 15: Time advanced by 25 units. Global time is 28 16: Flow1_EH1 : TRACE_SAP_Buffer_Write(3,10) 17: Unidir_IPC_3_4 : TRACE_SAP_Buffer_Transfer(3,4,10) 19: Reactor2_TPRHE4 ---handle_input(4,4)---> Flow1_EH2 20: Time advanced by 25 units. Global time is 53 21: Flow1_EH2 : TRACE_SAP_Buffer_Write(5,10) 22: Unidir_IPC_5_6 : TRACE_SAP_Buffer_Transfer(5,6,10)Flow2 proceeds 23: Time advanced by 25 units. Global time is 78 24: Flow2_EH1 : TRACE_SAP_Buffer_Write(9,10) 25: Unidir_IPC_9_10 : TRACE_SAP_Buffer_Transfer(9,10,10) 27: Reactor2_TPRHE5 ---handle_input(10,5)---> Flow2_EH2 28: Time advanced by 25 units. Global time is 103 29: Flow2_EH2 : TRACE_SAP_Buffer_Write(11,10) 30: Unidir_IPC_11_12 : TRACE_SAP_Buffer_Transfer(11,12,10)Flow3 proceeds 31: Time advanced by 25 units. Global time is 128 32: Flow3_EH1 : TRACE_SAP_Buffer_Write(15,10) 33: Unidir_IPC_15_16 : TRACE_SAP_Buffer_Transfer(15,16,10) 35: Reactor2_TPRHE6 ---handle_input(16,6)---> Flow3_EH2 36: Time advanced by 25 units. Global time is 153 37: Flow3_EH2 : TRACE_SAP_Buffer_Write(17,10) 38: Unidir_IPC_17_18 : TRACE_SAP_Buffer_Transfer(17,18,10) 39: Time advanced by 851 units. Global time is 1004
24 - Gill: Distributed Systems – 04/20/23
Solution: New Deadlock Avoidance Protocols
Papers at FORTE 2005 through EMSOFT 2006http://www.cse.wustl.edu/~cdgill/PDF/forte05.pdfhttp://www.cse.wustl.edu/~cdgill/PDF/
emsoft06_liveness.pdf
César Sánchez PhD dissertation at Stanford»Collaboration with Henny Sipma and Zohar Manna
Paul Oberlin: MS project here at WUSTL Avoid interactions leading to deadlock
»a liveness property Like synchronization, achived via scheduling
»Upcalls are delayed until enough threads are ready But, introduces small blocking delays
»a timing property» In real-time systems, also a safety property
25 - Gill: Distributed Systems – 04/20/23
Deadlock Avoidance Protocol Overview
• Regulates upcalls based on # of available reactor threads and call graph’s “thread height”– Does not allow exhaustion
• BASIC-P protocol implemented in the ACE Thread Pool Reactor– Using handle suspension
and resumption– Backward compatible,
minimal overhead
EH11
EH21
EH12
EH13
EH22
EH23
EH33
Client3
Client2
Client1
Server1 Server2
Reactor1 Reactor2
EH31
EH32
Flow1
Flow2
Flow3
26 - Gill: Distributed Systems – 04/20/23
Timing traces from model/execution show DA protocol regulating the flows to use available resources without deadlock
EH33
EH23EH13
Timing Traces: DA Protocol at Work
EH22EH12
R1 R2
EH32
Flow2
R1 R2
Flow3
EH31
EH21EH11
R1 R2
Flow1
27 - Gill: Distributed Systems – 04/20/23
DA Blocking Delay (Simulated vs. Actual)Actual ExecutionModel Execution
Blocking delayfor Client2
Blocking delayfor Client3
28 - Gill: Distributed Systems – 04/20/23
Overhead of ACE TP reactor with DA
Negligible overhead with no DA protocol
Overhead increases with number of event handlers because of their suspension and resumption on protocol entry and exit
29 - Gill: Distributed Systems – 04/20/23
Where Can We Go From Here? Distributed computing is
ubiquitous» …in planes, trains, and automobiles
…» …in medical devices and
equipment…» …in more and more places each day
Distributed systems offer many research opportunities» Discover them from specific problems» May allow advances even in well
worked areas (e.g., deadlock avoidance)
What new systems can we build by spanning different platforms?» I’ll leave that as an open question for
you to consider (and ultimately, to answer)
A fire extinguisher that runs UNIX?