Models of Distributed Computing
Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah
COMP 117: Internet Scale Distributed Systems (Spring 2019)
© 2010 Noah Mendelsohn
Architecting a universal Web
Identification: URIs
Interaction: HTTP
Data formats: HTML, JPEG, GIF, etc.
© 2010 Noah Mendelsohn 3
Goals
Introduce basics of distributed system design
Explore some traditional models of distributed computing
Prepare for discussion of REST: the Web’s model
© 2010 Noah Mendelsohn
Communicating systems
© 2010 Noah Mendelsohn
Communicating systems
CPU Memory Storage
CPU Memory Storage
We have multiple programs, running asynchronously, sending messages
Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)
© 2010 Noah Mendelsohn
Communicating Sequential Processes
CPU Memory Storage
CPU Memory Storage
We have multiple programs, running asynchronously, sending messages
Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)
We’ve got pretty clean higher level abstractions for use on a
single machine
© 2010 Noah Mendelsohn
Communicating systems
CPU Memory Storage
CPU Memory Storage
We have multiple programs, running asynchronously, sending messages
Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)
How can we get a clean model of two communicating machines?
© 2010 Noah Mendelsohn
Large scale systems
Internet
What are the clean abstractions on this scale?
How can we get a clean model of a worldwide network of
communicating machines?
© 2010 Noah Mendelsohn
WARNING!!
This is a very big topic…
…many important approaches have been studied and used…
…there is lots of operational experience, and also formalisms…
This presentation does not attempt to be either comprehensive or balanced…the goal is to introduce some key concepts
© 2010 Noah Mendelsohn
Traditional Models of Distributed Computing
- Message Passing
© 2010 Noah Mendelsohn
Message passing
CPU Memory Storage
CPU Memory Storage
Programs send messages to and from each others’ memories
© 2010 Noah Mendelsohn
Half duplex: one way at a time
CPU Memory Storage
CPU Memory Storage
Programs send messages to and from each others’ memories
© 2010 Noah Mendelsohn
Full duplex: both ways at the same time
CPU Memory Storage
CPU Memory Storage
Programs send messages to and from each others’ memories
© 2010 Noah Mendelsohn
Message passing
Data abstraction: – Low level: bytes (octets)
– Sometimes: agreed metaformat (JSON, XML, C struct, etc.)
Synchronization – Wait for message
– Timeout
© 2010 Noah Mendelsohn
Interaction Patterns
© 2010 Noah Mendelsohn
Between pairs of machines
Message passing: no constraints
Common pattern: request/response
CPU Memory Storage
CPU Memory Storage
Request
Response
© 2010 Noah Mendelsohn
Traditional Models of Distributed Computing
- Client Server
© 2010 Noah Mendelsohn
Client / server
Request / response is a traffic pattern Client / server describes the roles of the nodes Server provides service for client
CPU Memory Storage
CPU Memory Storage
Request service
Response
© 2010 Noah Mendelsohn
Client / server
Probably the most common dist. sys. architecture
Simple – well understood
Doesn’t explain: – How to exploit more than 2 machines
– How to make programming easier
– How to prove correctness: though the simple model helps
Most client/server systems are request/response
© 2010 Noah Mendelsohn
Traditional Models of Distributed Computing
- N-Tier
© 2010 Noah Mendelsohn
N-tier – also called Multilevel Client/Server
Layered Each tier provides services for next higher level Reasons:
– Information hiding – Management – Scalability
CPU Memory Storage
CPU Memory Storage
Request
Response
CPU Memory Storage
Request
Response
© 2010 Noah Mendelsohn
Typical N-tier system: airline reservation
Application - logic Browser or Phone App Database
iPhone or Android Reservation Application
Flight Reservation Logic
Reservation Records
Many commercial applications work this way
© 2010 Noah Mendelsohn
The Web itself is a 2 or 3 Tier system
E.g. Squid E.g. Firefox E.g. Apache
Browser
Proxy Cache (optional!)
Web Server
Many commercial applications work this way
© 2010 Noah Mendelsohn
Web Reservation System
Application - logic Browser or Phone App Application - logic
Web-Base Reservation Application
Flight Reservation Logic
Reservation Records
Many commercial applications work this way
E.g. Squid
Proxy Cache (optional!)
HTTP HTTP RPC? ODBC? Proprietary?
© 2010 Noah Mendelsohn
Web Publishing System
E.g. cnn.com Browser or Phone App Database or CMS
Web-Base Reservation Application
Content Web Site
Content Management System
Many commercial applications work this way
E.g. Akamai
Content Distribution
Network
© 2010 Noah Mendelsohn
Advantages of n-tier system
Separation of concerns – each layer has own role
Parallism and performance? – If done right: multiple mid-tier servers work in parallel
– Back end systems centralize mainly data requiring sharing & synchronization
– Mid tier can provide shared, scalable caching
Information hiding – Mid-tier apps shielded from data layout
Security – Credit card numbers etc. not stored at mid-tier
© 2010 Noah Mendelsohn
Other communication and design patterns
Spanning tree
Broadcast (send to many nodes at once)
Flood
Various P2P
Distributed consensus (e.g. Paxos) – distributed state machines
Etc.
© 2010 Noah Mendelsohn
Traditional Models of Distributed Computing
- Remote Procedure Call
© 2010 Noah Mendelsohn
Remote Procedure Call
The term RPC was coined by the late Bruce Nelson in his 1981 CMU PhD thesis
Key idea: an ordinary function call executes remotely
The trick: the language runtime or helper code must automatically generate code to send parameters and results
For languages like C: proxies and stubs are generated – Not needed in dynamic languages like Ruby, JavaScript, etc.
RPC is often (erroneously IMO) used to describe any request / response system
© 2010 Noah Mendelsohn
RPC: Call remote functions automatically
Interface definition: float sqrt(float n); Proxies and stubs generated automatically RPC provides transparent remote invocation
CPU Memory Storage
CPU Memory Storage
x = sqrt(4)
float sqrt(float n) { …compute sqrt… return result; }
float sqrt(float n) { send n; read s; return s; }
proxy
void doMsg(Msg m) { s = sqrt(m.s); send s; }
stub
Request invoke sqrt(4)
Response result=2 (no exception thrown)
© 2010 Noah Mendelsohn
RPC: Pros and Cons
Pros: – Transparency is very appealing
– Simple programming model
– Useful as organizing principle even when not fully automated
Cons – Getting language details right is tricky (e.g. exceptions)
– No client/server overlap: doesn’t work well for long-running operations
– May not optimize large transfers well – Not all APIs make sense to remote: e.g. answer = search(tree)
– Versioning can be a problem: client and server need to agree exactly on interface (or have rules for dealing with differences)
© 2010 Noah Mendelsohn
Traditional Models of Distributed Computing
- Distributed Object Systems
© 2010 Noah Mendelsohn
How do you build an RPC for this? Class Point { int x,y int getx() {return x;} int gety() {return y;} } Class Rectangle { …members and constructs not shown… Point getUpperLeft() {…}; Point getLowerRight {…}; } myRect = new Rectangle; …assume position set here.. int a = area(myRect); // REMOTE THIS CALL!
int area (Rectangle r) { width=r.getLowerRight().getx() – r.getUpperLeft.getx(); width=r.getLowerRight().gety() – r.getUpperLeft.gety(); }
Pass object to remote method
Call method on remoted object
Distributed Object systems make this work!
© 2010 Noah Mendelsohn
Distributed object systems
In the 1990s, seemed like a great idea Advantages of OO encapsulation & inheritance + RPC Examples
– CORBA (Industry standard) – DCOM (Microsoft)
Still quite widely used within enterprises Complicated
– Marshalling object references – Distributed object lifetime management – Brokering: which object provides the service today – Remote “new”: creating objects on remote systems – All the pros & cons of RPC, plus the above
Generally not appropriate at Internet scale
© 2010 Noah Mendelsohn
Traditional Models of Distributed Computing
- Some Other Options
© 2010 Noah Mendelsohn
Special Purpose Models
Remote File System – Network provides transparent access to remote files
– Examples: NFS, CIFS
Remote Database – Examples: ODBJ, JDBC
Remote Device – Remote printing, disk drive etc.
Virtual terminal – One computer simulates an interactive terminal to another
© 2010 Noah Mendelsohn
Some other interesting models Broadcast / multicast
– Send messages to everyone (broadcast) / named group (multicast)
Publish / subscribe (pub/sub) – Subscribe to named events or based on query filter – Call me whenever Pepsi’s stock price changes – Implements a distributed associative memory
Reliable queuing – Examples: IBM MQSeries, Java Message Service (JMS) – Model: queued messages, preserved across hardware crashes – Widely used for bank machine transactions; long-running (multi-day) eCommerce transactions; – Depends on disk-based transaction systems at each node to keep queues
Paxos – fault-tolerant distributed consensus – Families of protocols allow the entire system to achieve consensus on the values of data – Formal proofs exist of consistency, liveness, and related properties – Used to replicated “commands” that drive state machines to drive replicated processing at multiple
nodes
Tuple spaces – Pioneered by Gelernter at Yale (Linda kernel), picked up by Jini (Sun), and TSpaces (IBM) – Network-scale shared variable space, with synchronization – Good for queues of work to do: some cloud architectures use a related model to distribute work to
servers
© 2010 Noah Mendelsohn
Stateful and Stateless Protocols
© 2010 Noah Mendelsohn
Stateful and Stateless Protocols
Stateful: server knows which step (state) has been reached
Stateless: – Client remembers the state, sends to server each time
– Server processes each request independently
Can vary with level – Many systems like Web run stateless protocols (e.g. HTTP) over
streams…at the packet level, TCP streams are stateful
– HTTP itself is mostly stateless, but many HTTP requests (typically POSTs) update persistent state at the server
© 2010 Noah Mendelsohn
Advantages of stateless protocols
Protocol usually simpler
Server processes each request independently
Load balancing and restart easier
Typically easier to scale and make fault-tolerant
Visibility: individual requests more self-describing
© 2010 Noah Mendelsohn
Advantages of stateful protocols
Individual messages carry less data
Server does not have to re-establish context each time
There’s usually some changing state at the server at some level, except for completely static publishing systems
© 2010 Noah Mendelsohn
Text vs. Binary Protocols
© 2010 Noah Mendelsohn
Protocols can be text or binary on the wire
Text: messages are encoded characters
Binary: any bit patterns
Pros and cons quite similar to those for text vs. binary file formats
When sending between compatible machines, binary can be much faster because no conversion needed
Most Internet-scale application protocols (HTTP, SMTP) use text for protocol elements and for all content except photo/audio/video
HTTP 2.0 moved to binary (for msg size and parsing speed)
© 2010 Noah Mendelsohn
Summary
© 2010 Noah Mendelsohn
Summary
The machine-level model is complex: multiple CPUs, memories
A number of abstractions are widely used for limited-scale distribution
RPC is among the most interesting and successful
Statefulness / statelessness is a key design tradeoff
We’ll see next time why a new model was needed for the Web