Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | cornelia-wright |
View: | 215 times |
Download: | 0 times |
Fast Communication
Firefly RPC
Lightweight RPC CS 614 Tuesday March 13, 2001 Jeff Hoy
Why Remote Procedure Call?
Simplify building distributed systems and applications Looks like local procedure call Transparent to user
Balance between semantics and efficiency Universal programming tool
Secure inter-process communication
RPC Model
Client Application
Client Stub
Client Runtime
Server Application
Server Stub
Server RuntimeNetwork
Call
Return
RPC In Modern Computing
CORBA and Internet Inter-ORB Protocol (IIOP) Each CORBA server object exposes a set of
methods
DCOM and Object RPC Built on top of RPC
Java and Java Remote Method Protocol (JRMP) Interface exposes a set of methods
XML-RPC, SOAP RPC over HTTP and XML
Goals
Firefly RPC Inter-machine Communication Maintain Security and Functionality Speed
Lightweight RPC Intra-machine Communication Maintain Security and Functionality Speed
Firefly RPC
Hardware DEC Firefly multiprocessor
1 to 5 MicroVAX CPUs per node Concurrency considerations
10 megabit Ethernet
Takes advantage of 5 CPUs
Fast Path in a RPC
Transport Mechanisms IP / UDP DECNet byte stream Shared Memory (intra-machine only)
Determined at bind time Inside transport procedures “Starter”,
“Transporter”, “Ender”, and “Receiver” for the server
Caller Stub
Gets control from calling program Calls “Starter” for packet buffer Copies arguments into the buffer Calls “Transporter” and waits for reply Copies result data onto caller’s result
variables Calls “Ender” and frees result packet
Server Stub
Receives incoming packet Copies data into stack, a new data block,
or left in the packet Calls server procedure Copies result into the call packet and
transmit
Transport Mechanism
“Transporter” procedure Completes RPC header Calls “Sender” to complete UDP, IP, and
Ethernet headers (Ethernet is the chosen means of communication)
Invoke Ethernet driver via kernel trap and queue the packet
Transport Mechanism
“Receiver” procedure Server thread awakens in “Receiver” “Receiver” calls the stub interface included
in the received packet, and the interface stub calls the procedure stub
Reply is similar
Threading
Client Application creates RPC thread
Server Application creates call thread Threads operate in server application’s
address space No need to spawn entire process Threads need to consider locking
resources
Threading
Performance Enchancements
Over traditional RPC Stubs marshal arguments rather than
library functions handling arguments RPC procedures called through procedure
variables rather than by lookup table Server retains call packet for results Buffers reside in shared memory
Sacrifices abstract structure
Performance Analysis
Null() Procedure No arguments or return value Measures base latency of RPC mechanism
Multi-threaded caller and server
Time for 10,000 RPCs
Base latency – 2.66ms
MaxResult latency (1500 bytes) – 6.35ms
Send and Receive Latency
Send and Receive Latency
With larger packets, transmission time dominates Overhead becomes less of an issue Good for Firefly RPC, assuming large
transmission over network Is overhead acceptable for intra-machine
communication?
Stub Latency
Significant overhead for small packets
Fewer Processors
Seconds for 1,000 Null() calls
Fewer Processors
Why the slowdown with one processor? Fast path can be followed only in
multiprocessor environment Lock conflicts, scheduling problems
Why little speedup past two processors?
Future ImprovementsHardware
Faster network will help larger packets Triple CPU speed will reduce Null() time by 52%
and MaxResult by 36%
Software Omit IP and UDP headers for Ethernet datagrams,
2~4% gain Redesign RPC protocol ~ 5% gain Busy thread wait, 10~15% gain Write more in assembler, 5~10% gain
Other Improvements
Firefly RPC handles intra-machine communication through the same mechanisms as inter-machine communication
Firefly RPC also has very high overhead for small packets
Does this matter?
RPC Size Distribution
Majority of RPC transfers under 200 bytes
Frequency of Remote Activity
Most calls are to the same machine
Traditional RPC
Most calls are small messages that take place between domains of the same machine
Traditional RPC contains unnecessary overhead, like Scheduling Copying Access validation
Lightweight RPC (LRPC)
Also written for the DEC Firefly system
Mechanism for communication between different protection domains on the same system
Significant performance improvements over traditional RPC
Overhead Analysis
Theoretical minimum to invoke Null() across domains: kernal trap + context change to call and a trap + context change to return
Theoretical minimum on Firefly RPC: 109 us.
Actual cost: 464us
Sources of Overhead
355us added Stub overhead Message buffer overhead
Not so much in Firefly RPC Message transfer and flow control Scheduling and abstract threads Context Switch
Implementation of LRPC
Similar to RPC
Call to server is done through kernel trap Kernel validates the caller Servers export interfaces Clients bind to server interfaces before
making a call
Binding
Servers export interfaces through a clerk The clerk registers the interface Clients bind to the interface through a call
to the kernel Server replies with an entry address and
size of its A-stack Client gets a Binding Object from the
kernel
Calling
Each procedure is represented by a stub
Client makes a call through the stub Manages A-stacks Traps to the kernel Kernel switches context to the server Server returns by its own stub
No verification needed
Stub Generation
Procedure representation Call stub for client Entry stub for server
LRPC merges protocol layersStub generator creates run-time stubs in
assembly language Portability sacrificed for Performance Falls back on Modula2+ for complex calls
Multiple Processors
LRPC caches domains on idle processors Kernel checks for an idling processor in the
server domain If a processor is found, caller thread can
execute on the idle processor without switching context
Argument Copying
Traditional RPC copies arguments four times for intra-machine calls Client stub to RPC message to kernel’s
message to server’s message to server’s stack
In many cases, LRPC needs to copy the arguments only once Client stub to A-stack
Performance Analysis
LRPC is roughly three times faster than traditional RPC
Null() LRPC cost: 157us, close to the 109us theoretical minimum Additional overhead from stub generation
and kernel execution
Single-Processor Null() LRPC
Performance Comparison
LRPC versus traditional RPC (in us)
Multiprocessor Speedup
Inter-machine Communication
LRPC is best for messages between domains on the on the same machine
The first instruction of the LRPC stub checks if the call is cross-machine If so, stub branches to conventional RPC
Larger messages are handled well, LRPC scales by packet size linearly like traditional RPC
Cost
LRPC avoids needless scheduling, copying, and locking by integrating the client, kernel, server, and message protocols Abstraction is sacrificed for functionality
RPC is built into operating systems (Linux DCE RPC, MS RPC)
Conclusion
Firefly RPC is fast compared to most RPC implementations. LRPC is even faster. Are they fast enough?
“The performance of Firefly RPC is now good enough that programmers accept it as the standard way to communicate” (1990) Is speed still an issue?