Fast Communication Firefly RPC Lightweight RPC CS 614 Tuesday March 13, 2001 Jeff Hoy.

Fast Communication

Firefly RPC

Lightweight RPC CS 614 Tuesday March 13, 2001 Jeff Hoy

Why Remote Procedure Call?

Simplify building distributed systems and applications Looks like local procedure call Transparent to user

Balance between semantics and efficiency Universal programming tool

Secure inter-process communication

RPC Model

Client Application

Client Stub

Client Runtime

Server Application

Server Stub

Server RuntimeNetwork

Call

Return

RPC In Modern Computing

CORBA and Internet Inter-ORB Protocol (IIOP) Each CORBA server object exposes a set of

methods

DCOM and Object RPC Built on top of RPC

Java and Java Remote Method Protocol (JRMP) Interface exposes a set of methods

XML-RPC, SOAP RPC over HTTP and XML

Goals

Firefly RPC Inter-machine Communication Maintain Security and Functionality Speed

Lightweight RPC Intra-machine Communication Maintain Security and Functionality Speed

Firefly RPC

Hardware DEC Firefly multiprocessor

1 to 5 MicroVAX CPUs per node Concurrency considerations

10 megabit Ethernet

Takes advantage of 5 CPUs

Fast Path in a RPC

Transport Mechanisms IP / UDP DECNet byte stream Shared Memory (intra-machine only)

Determined at bind time Inside transport procedures “Starter”,

“Transporter”, “Ender”, and “Receiver” for the server

Caller Stub

Gets control from calling program Calls “Starter” for packet buffer Copies arguments into the buffer Calls “Transporter” and waits for reply Copies result data onto caller’s result

variables Calls “Ender” and frees result packet

Server Stub

Receives incoming packet Copies data into stack, a new data block,

or left in the packet Calls server procedure Copies result into the call packet and

transmit

Transport Mechanism

“Transporter” procedure Completes RPC header Calls “Sender” to complete UDP, IP, and

Ethernet headers (Ethernet is the chosen means of communication)

Invoke Ethernet driver via kernel trap and queue the packet

Transport Mechanism

“Receiver” procedure Server thread awakens in “Receiver” “Receiver” calls the stub interface included

in the received packet, and the interface stub calls the procedure stub

Reply is similar

Threading

Client Application creates RPC thread

Server Application creates call thread Threads operate in server application’s

address space No need to spawn entire process Threads need to consider locking

resources

Threading

Performance Enchancements

Over traditional RPC Stubs marshal arguments rather than

library functions handling arguments RPC procedures called through procedure

variables rather than by lookup table Server retains call packet for results Buffers reside in shared memory

Sacrifices abstract structure

Performance Analysis

Null() Procedure No arguments or return value Measures base latency of RPC mechanism

Multi-threaded caller and server

Time for 10,000 RPCs

Base latency – 2.66ms

MaxResult latency (1500 bytes) – 6.35ms

Send and Receive Latency

Send and Receive Latency

With larger packets, transmission time dominates Overhead becomes less of an issue Good for Firefly RPC, assuming large

transmission over network Is overhead acceptable for intra-machine

communication?

Stub Latency

Significant overhead for small packets

Fewer Processors

Seconds for 1,000 Null() calls

Fewer Processors

Why the slowdown with one processor? Fast path can be followed only in

multiprocessor environment Lock conflicts, scheduling problems

Why little speedup past two processors?

Future ImprovementsHardware

Faster network will help larger packets Triple CPU speed will reduce Null() time by 52%

and MaxResult by 36%

Software Omit IP and UDP headers for Ethernet datagrams,

2~4% gain Redesign RPC protocol ~ 5% gain Busy thread wait, 10~15% gain Write more in assembler, 5~10% gain

Other Improvements

Firefly RPC handles intra-machine communication through the same mechanisms as inter-machine communication

Firefly RPC also has very high overhead for small packets

Does this matter?

RPC Size Distribution

Majority of RPC transfers under 200 bytes

Frequency of Remote Activity

Most calls are to the same machine

Traditional RPC

Most calls are small messages that take place between domains of the same machine

Traditional RPC contains unnecessary overhead, like Scheduling Copying Access validation

Lightweight RPC (LRPC)

Also written for the DEC Firefly system

Mechanism for communication between different protection domains on the same system

Significant performance improvements over traditional RPC

Overhead Analysis

Theoretical minimum to invoke Null() across domains: kernal trap + context change to call and a trap + context change to return

Theoretical minimum on Firefly RPC: 109 us.

Actual cost: 464us

Sources of Overhead

355us added Stub overhead Message buffer overhead

Not so much in Firefly RPC Message transfer and flow control Scheduling and abstract threads Context Switch

Implementation of LRPC

Similar to RPC

Call to server is done through kernel trap Kernel validates the caller Servers export interfaces Clients bind to server interfaces before

making a call

Binding

Servers export interfaces through a clerk The clerk registers the interface Clients bind to the interface through a call

to the kernel Server replies with an entry address and

size of its A-stack Client gets a Binding Object from the

kernel

Calling

Each procedure is represented by a stub

Client makes a call through the stub Manages A-stacks Traps to the kernel Kernel switches context to the server Server returns by its own stub

No verification needed

Stub Generation

Procedure representation Call stub for client Entry stub for server

LRPC merges protocol layersStub generator creates run-time stubs in

assembly language Portability sacrificed for Performance Falls back on Modula2+ for complex calls

Multiple Processors

LRPC caches domains on idle processors Kernel checks for an idling processor in the

server domain If a processor is found, caller thread can

execute on the idle processor without switching context

Argument Copying

Traditional RPC copies arguments four times for intra-machine calls Client stub to RPC message to kernel’s

message to server’s message to server’s stack

In many cases, LRPC needs to copy the arguments only once Client stub to A-stack

Performance Analysis

LRPC is roughly three times faster than traditional RPC

Null() LRPC cost: 157us, close to the 109us theoretical minimum Additional overhead from stub generation

and kernel execution

Single-Processor Null() LRPC

Performance Comparison

LRPC versus traditional RPC (in us)

Multiprocessor Speedup

Inter-machine Communication

LRPC is best for messages between domains on the on the same machine

The first instruction of the LRPC stub checks if the call is cross-machine If so, stub branches to conventional RPC

Larger messages are handled well, LRPC scales by packet size linearly like traditional RPC

Cost

LRPC avoids needless scheduling, copying, and locking by integrating the client, kernel, server, and message protocols Abstraction is sacrificed for functionality

RPC is built into operating systems (Linux DCE RPC, MS RPC)

Conclusion

Firefly RPC is fast compared to most RPC implementations. LRPC is even faster. Are they fast enough?

“The performance of Firefly RPC is now good enough that programmers accept it as the standard way to communicate” (1990) Is speed still an issue?

Date post:	17-Dec-2015
Category:	Documents
Upload:	cornelia-wright
View:	215 times
Download:	0 times

Fast Communication Firefly RPC Lightweight RPC CS 614 Tuesday March 13, 2001 Jeff Hoy.

Documents