+ All Categories
Home > Documents > Distributed Systems (part 1) Chris Gill [email protected] Department of Computer Science and...

Distributed Systems (part 1) Chris Gill [email protected] Department of Computer Science and...

Date post: 04-Jan-2016
Category:
Upload: edwin-goodman
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
Distributed Systems (part 1) Chris Gill [email protected] Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE 591 Area 5 Talk Monday, November 10, 2008
Transcript
Page 1: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

Distributed Systems (part 1)

Chris [email protected]

Department of Computer Science and EngineeringWashington University, St. Louis, MO, USA

CSE 591 Area 5 Talk

Monday, November 10, 2008

Page 2: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

2 - Gill: Distributed Systems – 04/20/23

What is a Distributed System?

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.”

- Leslie Lamport

(BTW, this is entirely “ha, ha, only serious” ;-)

Page 3: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

3 - Gill: Distributed Systems – 04/20/23

Key Characteristics of a Distributed System

Programs on different computers must interact» A distributed system spans multiple computers» Programs must send information to each other» Programs must receive information from each other» Programs also need to do some work ;-)

Programs play different roles in those interactions» Send a request (client), process the request (server),

send a reply (server), receive and process reply (client)

» Remember where to find things (directory, etc. services)

» Mediate interactions among distributed programs (coordination, orchestration, etc. services)

Programs can interact in many other ways as well » Coordination “tuple spaces” (JavaSpaces, Linda,

LIME) » Publish-subscribe and message passing middleware» Externally driven (e.g., a workflow management

system)

Page 4: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

4 - Gill: Distributed Systems – 04/20/23

Distribution Semantics Matters a Lot

How are the different computers inter-connected?» Does all traffic move on a common data bus? » Or, does traffic move across (hierarchical) networks?» Or, does traffic move point-to-point between hosts?

Are there spatial and/or temporal factors?» Does hosts physical location/movement matter?» Is delay noticeable, are bandwidth limits relevant? » Are connections “always on” or can they be

intermittent?» Does the inter-connection topology change?» Is the inter-connection topology entirely dynamic?

Programs can interact in many other ways as well » Coordination “tuple spaces” (JavaSpaces, Linda,

LIME) » Publish-subscribe and message passing middleware» Externally driven (e.g., a workflow management

system)

Page 5: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

5 - Gill: Distributed Systems – 04/20/23

Distribution Semantics Examples (1/3)

Wired (hierarchical) internet» Can reach any host from any other host» Hosts are “always” on and available (% failure,

downtime)» Much of the WWW depends on this notion

(example?) J

I

H

G

F

A

B

C

D

E

Page 6: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

6 - Gill: Distributed Systems – 04/20/23

Distribution Semantics Examples (2/3)

Nomadic (hierarchical) internet» Some hosts are mobile, connect to nearest access

point» Hosts may be unavailable, but reconnect eventually

» Host-to-host path topology may change due to this

» Cell phones, wireless laptops exhibit this behaviorJ

I

H

G

F

A

B

C

D

EC

Page 7: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

7 - Gill: Distributed Systems – 04/20/23

Distribution Semantics Examples (3/3)

Mobile ad hoc networks (MANETS)» Mobile hosts connect to each other (w/out access

point)» Hosts may detect dynamic connection, disconnection» Hosts must exploit communication windows of

opportunity» Enables ad-hoc routing, message “mule”

behaviorsJ I

HG

F

AB

C

DE

H

Page 8: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

8 - Gill: Distributed Systems – 04/20/23

Distributed System Example (Wired)

Real-time avionics middleware » Layer(s) between the application and the operating

system» Ensures non-critical activities don’t interfere with

timing of critical ones» Based on other open-source middleware projects

» ACE C++ library and TAO object request broker» Standards-based (CORBA), written in C++/Ada

Flight demonstrations : BBN, WUSTL, Boeing,

Honeywell

laxity

static

static

timers

laxity

static

static

timers

Page 9: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

9 - Gill: Distributed Systems – 04/20/23

Distributed System Example (Nomadic/MANET)

Sliver» A compact (small footprint) workflow engine for

personal computing devices (e.g., cell phones, PDAs)» Allows mobile collaboration to assemble and

complete automated work-flows (task graphs)» Standards-based (BPEL, SOAP), written in Java

Developed by Greg Hackmann at WUSTL

Page 10: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

10 - Gill: Distributed Systems – 04/20/23

How do Distributed Systems Interact?

Remote method invocations are one popular style» Allows method calls to be made between programs» Middleware uses threads, sockets, etc. to make it so» CORBA, Java RMI, SOAP, etc. standardize the details

Other styles (better for nomadic/mobile settings)» Coordination “tuple spaces” (JavaSpaces, Linda,

LIME) » Publish-subscribe and message passing middleware» Externally driven (e.g., a workflow management

system)

Page 11: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

11 - Gill: Distributed Systems – 04/20/23

Challenges for (Wired) Distributed Systems Distributed systems are inherently complex

» Remote concurrent programs must inter-operate» Interactions must be assured of liveness and safety

Also must avoid accidental complexity» Design for ease of configuration, avoidance of

mistakes» System architectures and design patterns can help

map low level abstractions into appropriate higher level ones

Clients Servants

nORB

DispatcherDispatcherDispatcher

SOASOASkeletonSkeleton SkeletonSkeleton

StubStubStubStub

Page 12: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

12 - Gill: Distributed Systems – 04/20/23

How to Abstract Concurrent Event Handling?

Server

CONNECT

Client1

Port:27098

Client2

Port:26545

CONNECT

Goal: process multiple service

requests concurrently

using OS level threads

Port:30000 listen

Port:24467 accept

Port:25667 accept

Page 13: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

13 - Gill: Distributed Systems – 04/20/23

Basis: Synchronous vs. Reactive Read

read()

Clients Server

select()

Clients Server

read()

data data

HandleSetHandleSet

Page 14: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

14 - Gill: Distributed Systems – 04/20/23

Approach: Reactive Serial Event Dispatching

select()Clients

Application

Event Handlers

read()

read()

Reactor

handle_*()

HandleSet

Page 15: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

15 - Gill: Distributed Systems – 04/20/23

Interactions among Participants

Main ProgramConcrete Event

HandlerReactor

SynchronousEvent

Demultiplexer

register_handler(handler, event_types)

get_handle()

handle_events()select()

event

handle_event()

Page 16: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

16 - Gill: Distributed Systems – 04/20/23

Distributed Interactions with Reactive Hosts Application components implemented as handlers

»Use reactor threads to run input and output methods»Send requests to other handlers via sockets, upcalls

Example of a multi-host request/result chain»h1 to h2, h2 to h3, h3 to h4

reactor r1

handler h1

reactor r2 reactor r3

socket socket

handler h2 handler h4 handler h3

Page 17: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

17 - Gill: Distributed Systems – 04/20/23

WaitOnConnection Strategy

Client ServerC

Reactor

3 wait

Re

acto

r

Servant

Deadlock here

Callback

1 2 4

5

• Handler waits on socket connection for the reply– Makes a blocking call to

socket’s recv() method

• Benefits– No interference from other

requests that arrive while the reply is pending

• Drawbacks– One less thread in the

Reactor for new requests– Could allow deadlocks when

upcalls are nested

Page 18: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

18 - Gill: Distributed Systems – 04/20/23

WaitOnReactor Strategy

• Handler returns control to reactor until reply comes back – Reactor can keep processing

other requests while replies are pending

• Benefits– Thread available, no

deadlock– Thread stays fully occupied

• Drawbacks– Interleaving of request reply

processing– Interference from other

requests issued while reply is pending

Client ServerC

Reactorwait

Re

acto

r

Servant

Callback6

Deadlock avoided bywaiting on reactor

1

342

5

Page 19: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

19 - Gill: Distributed Systems – 04/20/23

Blocking with WaitOnReactor• Wait-on-Reactor

strategy could cause interleaved request/reply processing

• Blocking factor could be large or even unbounded – Based on the upcall

duration– And sequence of other

intervening upcalls

• Blocking factors may affect real-time properties of other end-systems– Call-chains can have

a cascading blocking effect

f2f5

f3

f5 replyqueued

f3 completes

f5 replyprocessed

f2 completes

Blocking factor for f2

Page 20: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

20 - Gill: Distributed Systems – 04/20/23

Why not a “Stackless” WaitOnReactor Variant?

• What if we didn’t “stack” processing of results?– But instead allowed them to handled

asynchronously as they are ready– “Stackless Python” takes this approach

– Thanks to Caleb Hines who pointed this out in CSE 532

• Benefits– No interference from other requests that arrive when

reply is pending– No risk of deadlock as thread still returns to reactor

• Drawbacks– Significant increase in implementation complexity– Time and space overhead to match requests to

results (other patterns we cover in CSE 532 could help, though)

Page 21: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

21 - Gill: Distributed Systems – 04/20/23

Could WaitOnConnection Be Used?

Main limitation is its potential for deadlock»And, it offers low overhead, ease of implementation/use

Could we make a system deadlock-free …» if we knew its call-graph … and were careful about how

threads were allowed to proceed?

Notice that a lot of distributed systems research has this kind of flavor…»Given one approach (of probably several alternatives)

Can we solve problem X that limits its applicability and/or utility? Can we apply that solution efficiently in practice? Does the solution raise other problems that need to be solved?

Page 22: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

22 - Gill: Distributed Systems – 04/20/23

Call graph often can be obtained

Each reactor is assigned a color

Deadlock can exist » If there exists > Kc

segments of color C»Where Kc is the

number of threads in node with color C

»E.g., f3-f2-f4-f5-f2 needs at least 2 & 1

Deadlock Problem in Terms of a Call Graph

f1f2

f3

f4

f5

From V. Subramonian and C. Gill, “A Generative Programming Framework for Adaptive Middleware”, 2004

Page 23: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

23 - Gill: Distributed Systems – 04/20/23

Simulation Showing Thread Exhaustion

Reactor1

Client1

Client2

Client3

Reactor2

Formally, increasing number of reactor threads may not prevent deadlock

Server1 Server2

Flow1

Flow2

Flow3

EH11

EH31

EH21

EH12

EH13

EH22

EH23

EH32

EH33

Clients send requests 3: Client3 : TRACE_SAP_Buffer_Write(13,10) 4: Unidir_IPC_13_14 : TRACE_SAP_Buffer_Transfer(13,14,10) 5: Client2 : TRACE_SAP_Buffer_Write(7,10) 6: Unidir_IPC_7_8 : TRACE_SAP_Buffer_Transfer(7,8,10) 7: Client1 : TRACE_SAP_Buffer_Write(1,10) 8: Unidir_IPC_1_2 : TRACE_SAP_Buffer_Transfer(1,2,10)Reactor1 makes upcalls to event handlers 10: Reactor1_TPRHE1 ---handle_input(2,1)---> Flow1_EH1 12: Reactor1_TPRHE2 ---handle_input(8,2)---> Flow2_EH1 14: Reactor1_TPRHE3 ---handle_input(14,3)---> Flow3_EH1Flow1 proceeds 15: Time advanced by 25 units. Global time is 28 16: Flow1_EH1 : TRACE_SAP_Buffer_Write(3,10) 17: Unidir_IPC_3_4 : TRACE_SAP_Buffer_Transfer(3,4,10) 19: Reactor2_TPRHE4 ---handle_input(4,4)---> Flow1_EH2 20: Time advanced by 25 units. Global time is 53 21: Flow1_EH2 : TRACE_SAP_Buffer_Write(5,10) 22: Unidir_IPC_5_6 : TRACE_SAP_Buffer_Transfer(5,6,10)Flow2 proceeds 23: Time advanced by 25 units. Global time is 78 24: Flow2_EH1 : TRACE_SAP_Buffer_Write(9,10) 25: Unidir_IPC_9_10 : TRACE_SAP_Buffer_Transfer(9,10,10) 27: Reactor2_TPRHE5 ---handle_input(10,5)---> Flow2_EH2 28: Time advanced by 25 units. Global time is 103 29: Flow2_EH2 : TRACE_SAP_Buffer_Write(11,10) 30: Unidir_IPC_11_12 : TRACE_SAP_Buffer_Transfer(11,12,10)Flow3 proceeds 31: Time advanced by 25 units. Global time is 128 32: Flow3_EH1 : TRACE_SAP_Buffer_Write(15,10) 33: Unidir_IPC_15_16 : TRACE_SAP_Buffer_Transfer(15,16,10) 35: Reactor2_TPRHE6 ---handle_input(16,6)---> Flow3_EH2 36: Time advanced by 25 units. Global time is 153 37: Flow3_EH2 : TRACE_SAP_Buffer_Write(17,10) 38: Unidir_IPC_17_18 : TRACE_SAP_Buffer_Transfer(17,18,10) 39: Time advanced by 851 units. Global time is 1004

Page 24: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

24 - Gill: Distributed Systems – 04/20/23

Solution: New Deadlock Avoidance Protocols

Papers at FORTE 2005 through EMSOFT 2006http://www.cse.wustl.edu/~cdgill/PDF/forte05.pdfhttp://www.cse.wustl.edu/~cdgill/PDF/

emsoft06_liveness.pdf

César Sánchez PhD dissertation at Stanford»Collaboration with Henny Sipma and Zohar Manna

Paul Oberlin: MS project here at WUSTL Avoid interactions leading to deadlock

»a liveness property Like synchronization, achived via scheduling

»Upcalls are delayed until enough threads are ready But, introduces small blocking delays

»a timing property» In real-time systems, also a safety property

Page 25: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

25 - Gill: Distributed Systems – 04/20/23

Deadlock Avoidance Protocol Overview

• Regulates upcalls based on # of available reactor threads and call graph’s “thread height”– Does not allow exhaustion

• BASIC-P protocol implemented in the ACE Thread Pool Reactor– Using handle suspension

and resumption– Backward compatible,

minimal overhead

EH11

EH21

EH12

EH13

EH22

EH23

EH33

Client3

Client2

Client1

Server1 Server2

Reactor1 Reactor2

EH31

EH32

Flow1

Flow2

Flow3

Page 26: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

26 - Gill: Distributed Systems – 04/20/23

Timing traces from model/execution show DA protocol regulating the flows to use available resources without deadlock

EH33

EH23EH13

Timing Traces: DA Protocol at Work

EH22EH12

R1 R2

EH32

Flow2

R1 R2

Flow3

EH31

EH21EH11

R1 R2

Flow1

Page 27: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

27 - Gill: Distributed Systems – 04/20/23

DA Blocking Delay (Simulated vs. Actual)Actual ExecutionModel Execution

Blocking delayfor Client2

Blocking delayfor Client3

Page 28: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

28 - Gill: Distributed Systems – 04/20/23

Overhead of ACE TP reactor with DA

Negligible overhead with no DA protocol

Overhead increases with number of event handlers because of their suspension and resumption on protocol entry and exit

Page 29: Distributed Systems (part 1) Chris Gill cdgill@cse.wustl.edu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA CSE.

29 - Gill: Distributed Systems – 04/20/23

Where Can We Go From Here? Distributed computing is

ubiquitous» …in planes, trains, and automobiles

…» …in medical devices and

equipment…» …in more and more places each day

Distributed systems offer many research opportunities» Discover them from specific problems» May allow advances even in well

worked areas (e.g., deadlock avoidance)

What new systems can we build by spanning different platforms?» I’ll leave that as an open question for

you to consider (and ultimately, to answer)

A fire extinguisher that runs UNIX?


Recommended