+ All Categories
Home > Documents > Developing a heterogeneous intrusion tolerant CORBA system

Developing a heterogeneous intrusion tolerant CORBA system

Date post: 28-Nov-2023
Category:
Upload: wsu
View: 0 times
Download: 0 times
Share this document with a friend
11
Copyright © 2002 IEEE. Reprinted from Proceedings of the 2002 International Conference on Dependable Systems & Networks, Washing, D.C., June 23-26, 2002. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Network Associates Laboratories's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by sending a blank email message to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
Transcript

Copyright © 2002 IEEE. Reprinted from Proceedings of the 2002 InternationalConference on Dependable Systems & Networks, Washing, D.C., June 23-26, 2002.

This material is posted here with permission of the IEEE. Such permissionof the IEEE does not in any way imply IEEE endorsement of any of NetworkAssociates Laboratories's products or services. Internal or personal use ofthis material is permitted. However, permission to reprint/republish thismaterial for advertising or promotional purposes or for creating newcollective works for resale or redistribution must be obtained from theIEEE by sending a blank email message to [email protected].

By choosing to view this document, you agree to all provisions ofthe copyright laws protecting it.

Developing a Heterogeneous Intrusion Tolerant CORBA System

David Sames, Brian Matt, Brian Niebuhr, GreggTally, Brent Whitmore

Distributed Systems Security DepartmentNAI Labs - Network Associates, Inc

3060 Washington RoadGlenwood, MD 21738 USA

{ dsames, brian_matt, bniebuhr, gtallybwhitmore}@nai.com

David BakkenSchool of Electrical Engineering and Computer

ScienceWashington State University

PO Box 642752Pullman, WA 99164-2752 USA

[email protected]

Abstract

Intrusion Tolerant systems provide high-integrityand high-availability services to their clients in theface of successful attacks from an adversary. TheIntrusion Tolerant Distributed Object Systems(ITDOS) research project1 is developing anarchitecture for a heterogeneous intrusion tolerantdistributed object system. ITDOS integrates aByzantine Fault Tolerant multicast protocol into anopen-source CORBA ORB to provide IntrusionTolerant middleware. This foundation allows up to fsimultaneous Byzantine failures of replicated serversin a system of at least 3f+1 replicas. Voting onunmarshalled CORBA messages allows heterogeneousapplication implementations for a given service,allowing for greater diversity in implementation andgreater survivability. Symmetric encryption sessionkeys generated by distributed pseudo-random functiontechniques provide confidential client-servercommunications. This paper overviews the ITDOSarchitecture, discusses some of the challengingtechnical issues related to intrusion tolerance inheterogeneous middleware systems, and offers viewson future areas of work.

1. Introduction

Intrusion prevention mechanisms and technologiescannot always prevent a well-funded and persistentadversary from penetrating information systems.Mission-critical systems require intrusion tolerance inorder to provide correct system operation even after an

1 This research performed under DARPA contract F30602-00-C-0183.

attacker has successfully breached the preventionmechanisms. Middleware is one area where a systemcan provide intrusion tolerance. Middleware is a veryuseful category of software that removes much of thetedium of distributed systems programming and shieldsprogrammers from having to deal with numerous kindsof heterogeneity inherent in a distributed system [1].Distributed object middleware is considered the mostgeneral kind of middleware, and the Common ObjectRequest Broker Architecture (CORBA) [29] is awidely adopted standard for distributed objectmiddleware. Middleware provides an ideal platform forintrusion tolerance extensions because it allows for avariety of applications to be built that can transparentlytake advantage of the intrusion tolerance properties ofthe middleware, eliminating the need for customsolutions for each application.

The goal of our framework, Intrusion TolerantDistributed Object Systems (ITDOS) [14] is to createan architecture for CORBA-based distributed objectsystems that can provide high reliability for mission-critical information systems by tolerating Byzantine[17] (arbitrary) faults in object servers. From a system-level point of view, this architecture providesadditional security in the form of a firewall proxy thatcan monitor BFTM messages at the enclave boundary.This paper does not discuss the firewall proxy forreasons of brevity. In ITDOS, symmetric session keysprovide confidential communications for eachassociation of client and server.

One prior distributed object middleware researchproject that has focused on tolerating Byzantinefailures is Immune [25], however, the votingmechanism does not support heterogeneousenvironments. Presently, the ITUA project [41]

focuses on tolerating Byzantine failures byunpredictability and adaptation, but uses a local proxyto integrate with the application as opposed toreplacing the transport. Most other Byzantine faulttolerant multicast (BFTM) systems to date have notsupported standard distributed object programmingmodels [36, 6].

The paper makes the following contributions bydescribing:

• An approach for allowing heterogeneousIntrusion Tolerant CORBA architectures usingactive replication

• Techniques for providing intrusion tolerantsymmetric key generation

• An approach for virtual connection semanticsover multicast

• An approach for state synchronization that isscalable to large object servers.

The remainder of this paper is organized as follows.Section 2 provides a description of the system model,including an overview of the architecture, assumptions,and threats addressed. Section 3 addresses the technicalchallenges and solutions to significant issues in thearchitecture. Section 4 reviews related work, whileSection 5 provides our conclusions. Finally, Section 6provides a look at future directions for our work.

2. System Model

The concept of operations for an ITDOS system isfairly simple. An ITDOS CORBA client invokes anoperation on an ITDOS CORBA server. The servercarries out that operation, either independently, or bymaking invocations on other CORBA servers, and

returns a result to that client. Figure 1 illustrates anominal configuration.

However, ITDOS modifies the traditional notion ofa CORBA server, in that a “server” is an asynchronoussystem of deterministic communicating state machines[37]. That system contains not more than fsimultaneously faulty processes and at least 3f + 1processes in all [4]. ITDOS requires a minimum of3f+1 replicated state machines to tolerate arbitrarytraitorous behavior by f state machines. Each statemachine in the system is implemented as a CORBAserver; the server hosts objects for access by CORBAclients. Furthermore, each state machine for a givensystem hosts the same CORBA objects as the others inthat system. Since ITDOS performs voting inmiddleware to support heterogeneous implementations,all invocations on objects must pass through themiddleware layer equally; that is, if one state machineinvokes operations on an object remotely (so that theinvocation passes through middleware), then allreplicated state machines in that group must invokeoperations on that object remotely. We term anindividual process in the system a replication domainelement. The collection of replication domainelements is a replication domain. ITDOS uses activereplication to maintain the same state of eachreplication domain element [10]; a client request isdelivered to each replication domain element in areplication domain by a totally ordered, BFT multicastprotocol. Each replication domain element executesthe invocation and returns its result to the client in thesame fashion.

Like many CORBA systems where servers can, inturn, be clients, ITDOS provides the ability for onereplication domain to be a client to another replication

Figure 1. Singleton Client and Replicated Server

Firewall

IT-CORBAProxy

(Secure, Reliable Multicast)

Client ApplicationCode

IT ORB (TAO)

Voter

Marshalling

Secure, Reliable Multicast

IP Multicast

ServerApplication

Code

IT ORB

ServerApplication

Code

IT ORB

ServerApplication

Code

IT ORB

FirewallIT-CORBA

Proxy

Client-SideFirewall

Server-SideFirewalls

ReplicationDomain

GroupMgr

GroupMgr

GroupMgr

FirewallIT-CORBA

Proxy

FirewallIT-CORBA

Proxy

Replication Domain Element

Firewall

IT-CORBAProxy

(Secure, Reliable Multicast)

Client ApplicationCode

IT ORB (TAO)

Voter

Marshalling

Secure, Reliable Multicast

IP Multicast

ServerApplication

Code

IT ORB

ServerApplication

Code

IT ORB

ServerApplication

Code

IT ORB

FirewallIT-CORBA

Proxy

Client-SideFirewall

Server-SideFirewalls

ReplicationDomain

GroupMgr

GroupMgr

GroupMgr

FirewallIT-CORBA

Proxy

FirewallIT-CORBA

Proxy

Replication Domain Element

domain. ITDOS supports replicated clients andservers, as well as singleton clients. Our architecturecurrently does not support replicated clients invokingoperations on singleton servers; however extendingITDOS to include that capability would not be toodifficult, since the voting mechanism required isalready used by the replication domain elements.

The objects for each server perform onlydeterministic operations. Furthermore, the serversmust execute deterministically. Without determinism,it is impossible to differentiate between arbitrary faultsin the system and non-deterministic behavior. Toreduce complexity and help ensure deterministicbehavior, each replication domain element employs asingle-threaded execution model.

In this system, faulty processes in a replicationdomain are detected primarily by processes external toit; either by clients receiving a faulty result, or otherservers receiving a faulty request. “Faulty” in this caseis a value that doesn’t match the majority of othervalues coming from a replication domain. Once areplication domain element is determined to be faulty,it must be removed from its replication domain topreserve confidential communications. Currently,ITDOS does not replace replication domain elementsthat it determines to be faulty; it merely removes themfrom the system - replacement remains to beimplemented.

The Group Manager handles replication domainmembership and virtual connection management inITDOS. The Group Manager consists of a replicationdomain of Group Manager processes. Each GroupManager replication domain element is not a CORBAserver since the connection management functions aspart of the middleware transport, rather than at theapplication level. These processes work together toregulate replication domain formation, replicationdomain membership, and connection establishmentbetween clients and servers. The Group Manager alsoprovides the symmetric session keys (calledcommunication keys), used to protect communications.

The ITDOS prototype is built on the ACE ORB(TAO) [38], which is an open-source CORBA ORB.The target platforms include Solaris and Linux.

2.1 Threats

The ITDOS system protects against any threats thatwould cause an observable deviation in expected serverbehavior. ITDOS relies upon the underlying BFTmulticast protocol to tolerate f simultaneous protocolfailures and upon the voting mechanism to detect andmask faulty values.

Provided that no more than f simultaneous failuresoccur, ITDOS guarantees service availability, integrity,

and communications confidentiality. However, there isa caveat to the confidentiality guarantee. Sincesymmetric keys protecting the traffic provideconfidentiality, a compromise of a replicated serverprovides keys to all the traffic within groups of whichthat server is a member, until the keys can be reissuedwithout the participation of the faulty server.Furthermore, a malicious server that remainsundetected can leak server state to unauthorizedrecipients.

While the underlying BFT protocol provides somedefense against DoS attacks against individualreplication domain elements [7], ITDOS is not resilientagainst unrestricted DoS attacks.

2.2 Assumptions

The ITDOS system is constrained by severalassumptions that provide a basis for our claims. Theseassumptions cover the operating environment, thetechnologies used, and other security measures thatmay be required for a fully operational system. Thefollowing assumptions apply:• The network does not partition such that more thanf of the replicated servers becomes unreachable [6].• The cryptographic algorithms used forauthentication [33, 34] and confidentiality [12] remainunbroken so that integrity, authentication,confidentiality and non-repudiation of ITDOSmessages are preserved.• There will not be more than f simultaneous faultsin a replication domain, where the number of servers inthe replicated group is greater than (3 f + 1) [4].• The deployment environment is not susceptible tocommon-mode failures since ITDOS supportsimplementation diversity in both language andplatform.• If one correct process delivers a message, allcorrect processes will eventually deliver a message.• Correct servers exhibit deterministic behavior.• The authentication tokens for each process areadequately protected and only available to authorizedusers.• Additional assumptions driven by the Byzantinefault tolerant protocol [7]

3. Architectural Features

This section discusses in more detail some of thetechnical issues encountered in designing andimplementing ITDOS. Figure 2 shows an expansion ofthe Secure Multicast Inter ORB Protocol (SMIOP)stack, providing an exploded view of integrating aByzantine Fault Tolerant Multicast into TAO. The

following sections discuss many Figure 2 elements.ITDOS provides connection semantics (ITDOSSockets) on top of a Byzantine fault tolerant multicastprotocol (Secure Reliable Multicast), which, in turn,uses IP Multicast to send messages to a group ofprocesses. The SMIOP Pluggable protocol allows usto integrate the transport with TAO.

SMIOP Transport

IT ORB

Replication Domain Element

Application

SMIOP

IP Multicast

Secure Reliable Multicast

Marshal

ConnectionManager

Voter

ITDOS PluggableProtocol

SRMSocket

Queue Management

ITDOSSockets

ObjectA Object

B ObjectC

Group

Mgr

3.1 Secure Reliable Multicast

Our project plan was to adapt and integrate anexisting BFT multicast protocol rather than create ourown. In the literature of BFT multicast protocols, weencountered two basic models of client-servercommunication: message passing and request/responseprotocols. The message passing protocols essentiallyprovide a transport that delivers messages to a group intotal-order; every member of the group receives allmessages sent to the group in the same sequence.These protocols depend upon virtual synchrony to

accomplish this task, expelling members of the groupthat are not participating according to the protocolspecification to make progress. The request/responseprotocol we encountered is modeled on a state-transferparadigm. The protocol attempts to deliver allmessages to the replicas in the group; however,replicas are allowed to become unsynchronized. In thiscase, a “faulty” replica is proactively recovered – thereplica’s state is synchronized with the rest of thereplica group by transferring state data from correctreplicas.

We selected the BFT mechanism developed byMiguel Castro and Barbara Liskov [6]. Hereafter, werefer to the protocol as “Castro-Liskov”. The Castro-Liskov protocol uses a state-transfer approach coupledwith a request/response mechanism as the interface toits client-server protocol. In a simplified explanation, asingleton client sends an invocation message to areplica group. The replicas decide on the total order ofthe message (among other client requests), and deliverit to the application to be executed. Each replicacomputes the response and delivers it to the clientdirectly. The client waits for f + 1 replies with the sameresult; this is the result of the operation [6]. Each serveris a state machine, keeping its state in a contiguousblock of memory.

These two characteristics, state transfer andrequest/response, do not lend themselves easily to aCORBA system. (While a message passing protocolmay have been more appropriate to the ITDOSimplementation, we could not obtain a prototype.)Object state synchronization could create performanceproblems, and create scalability issues. Anotherdifficulty is efficiently obtaining state from an objectimplementation. Additionally, a request/responseprotocol must support concurrency to enable nestedinvocations. For example, if a replicated state machineprocessing a request needs to also send a request (aspart of the servant implementation), it must be able toreceive the intermediate reply over the same reliableand totally ordered multicast channel on which itreceived the original request, before returning fromthat original request. ITDOS solves both the statesynchronization and concurrency problems withminimal modifications to the Castro-Liskov library.

An ITDOS server implements a message queue thatis the state machine. Whenever Castro-Liskovsynchronizes the replica state, the message queue issynchronized. Each replication domain elementmaintains equivalent object state since each processesmessages in the same order as delivered by the Castro-Liskov transport. Obviously, the size of this messagequeue is limited by the size of the contiguous block ofmemory; the message queue must be garbage-collectedand more memory made available for incoming

Figure 2. ITDOS Protocol Stack

messages. We are still developing the algorithm to doso; however, this step essentially adds virtualsynchrony [2] to the system – replicas that do notparticipate according to the queue managementprotocol must be expelled to make progress. It isimportant to note here that the request/responseprotocol of Castro-Liskov is effectively changed into amessage passing protocol. The original intent ofCastro-Liskov, which is to eliminate the need forvirtual synchrony through proactive recovery [6], issomewhat maligned. However, this approach seemsthe best way to integrate Castro-Liskov into an ORBarchitecture and provides greater scalability for largeobject servers.

To allow nested invocations, ITDOS executes theCastro-Liskov transport as a message delivery systemrunning in one thread of the CORBA server process.All messages sent to the replication domain aredelivered through this thread to the ORB thread. Thereply expected at the Castro-Liskov layer is a staticreply that acts as an acknowledgement message for theprotocol. The CORBA reply is sent to the client as allmessages are sent to that client – through the Castro-Liskov transport making it essentially unidirectional.There are two threads for each replication domainelement; one for ORB execution and one for Castro-Liskov message delivery. This technique allowsITDOS to support nested invocations where a serverneeds to receive a reply on the Castro-Liskov threadfrom an invocation before completing its execution fora particular request.

3.2 Group Membership

Ordering group cardinality directly impactsperformance and scalability. BFT total-orderingprotocols are expensive; additionally, the number ofmessages exchanged is directly related to the numberof members in the ordering group. Given the non-linear performance penalties in large orderinggroups, the ordering groups should be as small aspossible. For that reason, clients cannot be in thesame ordering group as the servers. With thetransitive nature of nested invocations, all the hostsof the entire system would then tend to aggregateinto some small number of very large groups. Ofcourse, the more interconnected the architecture, thelarger these groups will be required to be, andtherefore, the higher the performance penalty. InITDOS, the replication domain is the orderinggroup; the clients do not participate in the messageordering with the servers. While this approachlimits the size of the ordering group, it causes aproblem where a replication domain needs to sendmessages to a singleton client (and it must to send

CORBA replies). Since each replication domainelement must operate deterministically for allinvocations and non-faulty clients require availability,an additional protocol is needed to ensure thatreplicated servers can proceed if a singleton client fails,and that the singleton client will get an answer if it isnot faulty.

This group membership choice also causes aproblem in detecting a faulty server within the replicagroup itself. Since members of the replica group don’tsee the messages sent by the other replicas in thegroup, only processes that receive messages from thereplica group can detect faults based on messagevalues. Once a fault is detected, the offendingreplication domain element may be expelled from itsreplication domain. The ordering protocol itself candetect faulty servers if they do not participate in thespecified protocol correctly. The ITDOSimplementation does not modify Castro-Liskov toexpel replicas based on faulty participation; in fact, oneof the main features of Castro-Liskov is to keep faultyreplicas in the system until they are proactivelyrecovered [6].

3.3 Connection Management

CORBA’s General Inter-ORB Protocol (GIOP)requires connection semantics to integrate the Castro-Liskov transport into an ORB architecture; the ITDOSprototype creates virtual connections over the Castro-Liskov transport layer. Furthermore, state informationlike cryptographic keys, needs to be associated with a“connection” abstraction. This necessitates having amechanism by which all of the members of areplication domain can open a connection to anotherreplication domain with multiple members. Acentralized service, the Group Manager, managesconnections, but is implemented in an intrusiontolerant manner. The Group Manager is an ITDOS

Figure 3. Connection Establishment

replication domain, providing high availability andintegrity.

In CORBA, a client usually has an object referenceto some service with which it wishes to communicate.In ITDOS, the object reference contains the address ofthe replication domain in which that service is located.Figure 3 illustrates connection establishment. Step (a)indicates the logical object invocation that the clientwishes to make. Transparently to the application layer,the ITDOS transport layer sends (1) an open_requestmessage to the Group Manager of the replicationdomain with the appropriate information from theobject reference. The Group Manager, afterdetermining that the client and target are both valid andavailable, generates symmetric communication keys toboth establish and protect the connection for thatclient-server association. The communication keys arefirst sent (2) to the target replication domain (using theCastro-Liskov transport), then to the client (3). Thecommunication key is used to encrypt traffic betweenthe client and the server for that connection. Finally,the CORBA invocation is sent via Castro-Liskov to theserver (4), and the reply is returned to the client (5).

All of this interaction is accomplished transparentlyto the application developer, since it is integrated intothe ORB through a TAO Pluggable Protocol. TheTAO Pluggable Protocol [27] provides an interface tothe ORB for ITDOS to layer traditional socketsemantics on the Castro-Liskov BFT protocol.

3.4 Replication Granularity

Conceptually, the CORBA paradigm provideslocation transparency: its APIs presume that objectsare location-independent; they may exist in any server.However, in an actively replicated, intrusion tolerantsystem such as ITDOS, allowing unlimited objectlocations proves problematic. For instance, if objects aand b are co-located in Server A, but only a resides inServer B, then when a invokes on b in Server A theinvocation is local, but in Server B, the invocation isremote. In the first case, since the invocation does notpass through the middleware, the voting mechanismdoes not come into play. This may lead to inconsistentresults from the invocation and ordering problems forCastro-Liskov. Replicating a server, complete with allthe objects it hosts, is conceptually and technicallysimpler. This approach provides some significantbenefits as well. All client interactions with separateobjects hosted by a particular server can use the sameconnection. Since connection-establishment is a fairlyheavyweight process, connection reuse enhancesperformance. The restriction lends itself to greaterscalability since the granularity is at the process levelas opposed to possibly thousands of objects that may

need to be managed individually. Since ITDOSmanages connections on a process basis, we alsoconserve multicast address allocation, of which thereare a finite (albeit very large) number.

3.5 Confidentiality

Symmetric key encryption using groupcommunication keys provides client-serverconfidentiality. The Group Manager replicationdomain generates and provides these keys to both theclient and the server to establish a security associationbetween those parties. In this group keying modelthere are two primary risks to the confidentiality of thecommunications: compromise of a replication domainelement exposing all communication keys forconnections established with that replication domain,and compromise of an improperly designed GroupManager replication domain element exposing allcommunication keys in the system. In both of thesecases, the compromise may be detected in a timelymanner, or may remain undetected indefinitely. TheITDOS architecture must address both of thesesituations.

ITDOS minimizes the impact of the undetectedcompromise of a replication domain element byassigning a unique communication key for each pair ofcommunicating client and server replication domains.In this case, confidentiality is only lost for the groupsof which that element is a member. If the compromisedelement is detected, it is excluded from its replicationdomain (and from receiving new communication keys)by re-keying the communication group, excepting thecompromised element. However, there will be some(short) period of time before the compromisedcommunication keys have been replaced.

The compromise of a Group Manager replicationdomain element, while perhaps less likely than thecompromise of a typical replica (due to increasedsecurity measures), could have a significant andunacceptable impact on client/server confidentiality.In a “traditional” approach to the design of a GroupManager, each of the Group Manager replicationdomain elements agree on each communication keyand distribute the entire key to the appropriate clientand server replicas using secure channels. In such anapproach, the compromise of a single Group Managerprocess would compromise all communication keysknown to the Group Manager when the compromiseoccurred, and all subsequent communication keysgenerated until the compromise is detected. Once thecompromise is detected, all the communication keysand all other sensitive keys known to the compromisedGroup Manager replication domain element must bereplaced in order to restore confidentiality. The time

necessary to restore confidentiality in this scenario isunacceptable. To address this problem, the ITDOSarchitecture uses threshold cryptography forcommunication key generation. Each Group Managerreplication domain element generates distinct shares ofa communication key, then distributes its key shares tothe appropriate client/server replication domains usinga secure channel2. The fragmented keys minimize theamount of key information lost if a Group Managerelement is compromised. An attacker mustcompromise multiple elements to generate acommunication key.

During connection establishment (see Figure 3), orwhen an element is removed from a client or serverreplication domain, ITDOS generates a newcommunication key. Each Group Manager replicationdomain element uses a common non-repeating value asan input a distributed (non-interactive) pseudo-randomfunction [26], (which is essentially equivalent to arandom access coin-tossing scheme [5]) to generateshares of the same key. Examples of this class offunction include [26, 5, 39]. The ITDOS GroupManager uses a distributed random number generationprocess to initialize (and periodically re-initialize) thepseudo-random number generators (PNGs) of eachGroup Manager replication domain element. Theoutputs of the pseudo-random number generatorsbecome the common inputs to the distributed function.ITDOS relies upon configuration inputs for its pseudo-random functions. The non-interactive distributedfunction generates the key shares and verificationinformation for the secret key and each key share.These distributed functions prevent up to f corruptGroup Manager replication domain elements fromtampering with or obtaining the communication evenwhen they combine their key shares and verificationinformation.

Each Group Manager replication domain elementdistributes its particular key share and verificationinformation to each of the replication domain elementsthat will share the communication key (e.g. duringsteps 2 and 3 of connection establishment). Theclients and sever replication domain elements eachdecrypt the messages from the Group Managerreplication domain, verify the correctness of the keyshares they receive, and combine the shares to form thecommunication key. If f or less Group Managerreplication domain elements are corrupted, the clientand server replication domain elements will be able to

2 When a new replication domain is established each process of theGroup Manager establishes a unique pair-wise shared symmetric keywith each replication domain element and a group key that theprocess shares with all of the elements of the replication domain .

generate the same key and these replication domainelements can verify which Group Manager replicationdomain elements acted correctly.

3.6 Voting

The voting mechanism is the key to implementingintrusion tolerance in a heterogeneous CORBA system.Since the marshalled GIOP [28] format can differdepending on platform, ITDOS cannot simply performbyte-by-byte voting on the raw message data. Byte-by-byte voting does not work correctly in the presence ofheterogeneity [3] or inexact values. Rather, votingmust be accomplished in middleware, after the rawmessage stream has been unmarshalled. This processallows us to determine equivalency even when theunderlying data representation is different.Furthermore, by providing access to the actual data inthe reply or request, the voter can employ much moreflexible voting algorithms than other implementations.In particular, ITDOS bases its voting mechanism onthe Voting Virtual Machine [3].

The accuracy of floating point and other data typesmay vary from platform to platform. In this case, thevoting algorithm must be able to determineequivalency despite values that may differ by somesmall value. This is called inexact voting, whichITDOS supports [31]. In fact, equivalent values neednot be transitive with inexact values; if a = b and b =c, this does not imply that a = c since b may be closeto a and c, but a is not close to c.

Voting allows ITDOS to provide an applicationwith a single message representing the correct valuebased on multiple messages from a replication domainwith multiple members. This eliminates duplicaterequests and replies from replicas. There is a voterelement for each connection in our protocol stack. Thevoter depends upon the total ordering property of thesecure reliable multicast protocol to deliver messagesin a deterministic fashion. In particular, the Castro-Liskov transport protocol delivers messages, identifiedby both source and request identifier, in the same orderto all correct processes. Consequently, eachdeterministic voter reaches a decision threshold in thesame order, thus preserving ordered delivery ofmessages to the ORB. Since the voter is deterministic,the individual replication domain elements need notsynchronize with each other regarding the valuesdetermined by their voters, a step which would incuradditional expense.

The voter requires a minimum of f+1 identicalmessages or 2 f+1 total messages to perform a vote [6].It does not wait for all 3f+1 messages to arrive beforeperforming a vote since that would cause the system tobe vulnerable to network delays and faulty processes

that may be deliberately slow (or unresponsive). Sincethe voter must collate messages to enact a vote, it mustmaintain some state regarding those messages.However, in the situations where the voter does notreceive enough messages to perform a vote, the votermust perform garbage collection to continue makingprogress and limit the resources it uses.

In the ITDOS protocol stack, each connection has avoter object that collates messages on a connectionbasis. Message originators embed request identifiers inall the requests and replies, and other control messagesthey send. The identifiers permit originators to collatemultiple requests or replies, and to match requests withtheir corresponding replies. ITDOS guarantees theseidentifiers to be strictly increasing values. SinceITDOS uses a single-threaded concurrency model, aclient replication domain can only send a new requestafter the client replication domain has received andvoted upon the replies to the previous requests.Consequently, only one outstanding request can existfor a connection at a time.

Voters determine when to stop waiting for messagesforthcoming from a replication domain. For example,a client replication domain element invokes on aServer replication domain, then waits for a reply. Nreplication domain elements in the Server replicationdomain send replies. If everything works as expectedthe client replication domain element should receive2f+1 messages, then perform a value vote, and finallyreceive the remaining n-(2f+1) messages. Requestidentifiers in the reply identify the request that initiatedthat reply. Any just-received request identifier shouldmatch the identifier of the outstanding request that theclient replication domain element last sent, and onwhich the replication domain element is currentlyvoting. If the reply’s identifier does not match theexpected message value, then the ITDOS receiverdiscards the message. It does this regardless ofwhether or not the receiver has accepted all n copies ofa message with a given request identifier.

A discarded message could be from a Byzantineprocess, or it could be a late-coming reply from anearlier request. The receiver neither uses themessage’s value nor penalizes the sender. By doingthis, receivers avoid retaining information withoutlimit, avoiding a potential attack. However, thereceiving process cannot distinguish between late andByzantine processes, and so cannot generate proof tothe Group Manager that the sending process is, in fact,Byzantine.

Voting is also used to detect faulty processes.However, this mechanism is not completely reliablesince the voter calculates a result after receiving 2f+1messages and it is possible that the faulty response isnot among those received. In that case, its fault will

not be detected with that particular vote. The receiverof the 2f+1 messages is still guaranteed the correctvalue, provided there are not more than f faulty values.

If a faulty value is detected the voter must initiateaction to expel the Byzantine process. The GroupManager is responsible for expelling the faultyprocess(es) from the affected replication domain bykeying them out of all communication groups of whichthey are part. Thus, a replication domain elementnotifies the Group Manager that action is required bysending a change_request. A faulty replication domainelement may be expelled by request from a singletonreplication domain element, or by request from themultiple members of a replication domain. In the firstcase, a single client process invokes on a server anddetects a faulty reply from some (up to f) replicationdomain elements in that server replication domain.The single client process then sends a change_requestincluding proof to the Group Manager to expel thefaulty process(es). A potential vulnerability is that theclient is malicious and is attempting to expel correctprocesses from the target replication domain. Toprevent against this sort of attack, ITDOS requiresproof from the single client of the faulty value(s). Theproof is the set of signed messages through which thefaulty value was detected. Since each message containsa sequence number to protect against replay, and eachmessage is signed, the Group Manager can determinethe validity of the proof. The Group Manager mustperform a vote on the values just as the client did – onunmarshalled data. However, the Group Manager doesnot run in an ORB. To solve this difficulty, ITDOSadds the full interface name to the GIOP message(which GIOP doesn’t normally provide). Furthermore,ITDOS provides a marshalling engine that the GroupManager uses to perform its vote. Once GroupManager determines that the request is valid, itgenerates new communication keys and distributesthem to all the correct processes in the affectedreplication domain and associated clients and servers,effectively removing the faulty process.

In the case of a replication domain detecting a faultyvalue, each replication domain element in thereplication domain will send an individualchange_request message. Proof here is not necessarysince the request originated from a trustworthy source,a replication domain. However, the Group Managermust receive the necessary number of messages toperform a vote on the request message prior toexpelling the faulty process(es).

3.7 Related Work

There has been a limited amount of prior work ondistributed object middleware systems that tolerate

Byzantine failures. One such system, Immune [25]does not support heterogeneous environments due toits use of byte-by-byte voting. This is also true offault-tolerant CORBA projects that have toleratedcrash failures, including Electra [16], AQuA [11], andEternal [22]. Other efforts in BFTM systems, likeRampart [35, 36] and Castro-Liskov [6, 7, 8, 9], havelimited the implementation model so that applicationscannot be built using a standard distributed objectparadigm. ITDOS allows a standard CORBA model ofdevelopment in a heterogeneous environment. TheseBFTM systems also use value comparison mechanismsequivalent to byte-by-byte voting, so they do not workcorrectly in the face of heterogeneity, or inexactvalues, like floating point numbers. The ITUA projectis similar to the ITDOS effort; however, theirintegration point is a local proxy which uses amodified AQuA protocol to handle replication. Theyare extending the protocol to handle Byzantine failures[41].

4. Future Work

There are several areas that require further work tomake this architecture viable for a wide variety ofdistributed systems. The single-threaded approach tohandling determinism may be a limiting factor indeployment and scalability of any one particularserver. Castro describes a generalized technique forhandling multi-threaded applications in BFT systems[40]. Our current implementation lacks the ability tocreate new replicas on-the-fly to replace faultyreplicas.

Transferring large objects poses another obstacle toefficient performance. While signing and voting onindividual messages when they are of “small” size canbe a reasonable performance sacrifice for security,doing so on large, multi-gigabyte image objects (forinstance), could pose a significant problem (mostCORBA systems probably wouldn’t transfer such largeobjects). However, to improve flexibility within thearchitecture, we must find an efficient way of movinglarger messages through the system withconfidentiality, authentication, and integrity.

Once we fully implement ITDOS, we will analyzethe performance tradeoffs required for given levels ofintrusion tolerance. Similarly, we are considering thepossibility of adaptive voting such as outlined in [32].

5. Conclusions

ITDOS provides an architecture for highlyavailable, high-integrity distributed object systems.The system uniquely provides support for

heterogeneous server implementations on differentplatforms to increase the survivability of the protectedservice. ITDOS improves scalability independent ofthe number of objects by using a message queue tosynchronize replica state, as opposed to state transfertechniques. Communication confidentiality ispreserved via symmetric encryption techniquesbetween clients and servers, and key distribution isperformed using threshold cryptography techniques.The fundamental building blocks of thisimplementation should be extensible for otherdistributed object middleware, like Java RMI, enablinggreater flexibility in system development.

References[1] Bakken, David. “Middleware.” Chapter in Encyclopediaof Distributed Computing, J. Urban and P. Dasgupta, eds.,Kluwer Academic Publishers, 2001, to appear.[2] K. Birman, "Virtual synchrony model", in ReliableDistributed Computing with the Isis toolkit, IEEE CS Press,1994.[3] Bakken, Zhan, Jones, Kann, “Middleware Support forVoting and Data Fusion” in Proceedings of the InternationalConference on Dependable Systems and Networks,IEEE/IFIP, Göteborg, Sweden, July 1-4, 2001, 453-462.[4] G. Bracha and S. Toueg. Asynchronous Consensus andBroadcast Protocols. Journal of the ACM, 32(4), 1995.[5] C. Cachin, K. Kursawe, and V. Shoup. Random oraclesin Constantinople: Practical asynchronous Byzantineagreement using cryptography. In Proceedings, 19th ACMSymposium on Principles of Distributed Computing (PODC2000), pages 123-132, July 2000.[6] M. Castro and B. Liskov. Proactive Recovery in aByzantine-Fault-Tolerant System. In Proceedings of the 4thUSENIX Symposium on Operating Systems Design andImplementation, October 2000.[7] M. Castro and B. Liskov. Practical Byzantine faulttolerance. In Proceedings of the 3rd USENIX Symposium onOperating Systems Design and Implementation, February1999.[8] M. Castro and B. Liskov. Authenticated Byzantine FaultTolerance Without Public-Key Cryptography. TechnicalMemo MIT/LCS/TM-589, MIT Laboratory for ComputerScience, 1999.[9] M. Castro and B. Liskov. A Correctness Proof for aPractical Byzantine-Fault-Tolerant Replication Algorithm.Technical Memo MIT/LCS/TM-590, MIT Laboratory forComputer Science, 1999.[10] M. Chereque, D. Powell, P. Reynier, J. Richier, and J.Voiron. “Active Replication in Delta-4”, in Proceedings ofthe Twenty Second International Symposium on Fault-Tolerant Computing , IEEE, Boston, Mass., July 1992, 28-37.[11] M. Cukier, J. Ren, C. Sabnis, W. H. Sanders, D. E.Bakken, M. E. Berman, D. A. Karr and R. E. Schantz,‘‘AQuA: An adaptive architecture that provides dependabledistributed objects,’’ Proceedings of the IEEE 17thSymposium on Reliable Distributed Systems, West Lafayette,IN (October 1998), pp. 245-253.

[12] “Data encryption standard”, Federal InformationProcessing Standards Publication 46-3, U.S. Department ofCommerce/National Bureau of Standards, National TechnicalInformation Service, Springfield, Virginia, October, 1999.[13] W. Fenner. RFC 2236: Internet Group ManagementProtocol, version 2. IETF, November 1997.[14] http://www.pgp.com/research/nailabs/distributed-systems/intrusion-tolerant.asp.[15] K. Kihlstrom, L. Moser and P. Melliar-Smith, ``TheSecureRing Protocols for Securing Group Communication,”Proceedings of the IEEE 31st Hawaii InternationalConference on System Sciences , Kona, Hawaii (January1998), vol. 3, pp. 317-326.[16] S. Landis and S. Maffeis, ‘‘Building reliable distributedsystems with CORBA,’’ Theory and Practice of ObjectSystems, vol. 3, no. 1 (April 1997), pp. 31-43.[17] Lamport, Shostak, and Pease, “The Byzantine GeneralsProblem”, ACM Transactions on Programming Languagesand Systems, Vol. 4, No. 3, July 1982, Pages 382-401.[18] L. Moser, Y. Amir, P. Melliar-Smith, and D. Agarwal,“Extended virtual synchrony,” In Proceedings of the 14thIEEE International Conference on Distributed ComputingSystems (Poznan, Poland) June 1994, pp. 56–65.[19] L. E. Moser, P. M. Melliar-Smith, R. R. Koch and K.Berket, ‘‘A group communication protocol for CORBA,’’Proceedings of the 1999 ICPP International Workshop onGroup Communication, Aizu, Japan (September 1999), pp.30-36.[20] L. Moser, P. Melliar-Smith, P. Narasimhan, L.Tewksbury, and V. Kalogeraki, Eternal: Fault Tolerance andLive Upgrades for Distributed Object Systems, DARPAInformation Survivability Conference & Exposition VolumeII, 25 - 27 January, 2000, Hilton Head, South Carolina.[21] L. Moser, P. Melliar-Smith and N. Narasimhan, "TheSecureGroup Communication System", Proceedings of theIEEE Information Survivability Conference, Hilton Head, SC(January 2000).[22] L.E. Moser, P.M. Melliar-Smith, P. Narasimhan, V.Kalogeraki, and L. Tewksbury, "The Eternal System",Workshop on Compositional Software Architectures ,Monterey, California, January 6-8, 1998.[23] D. Malkhi, M. Merritt, O. Rodeh, Secure reliablemulticast protocols in a WAN. Proceedings of the IEEE 17thInternational Conference on Distributed Computing Systems,1997, pp. 94-97.[24] D. Malkhi and M. Reiter, "A high-throughput securereliable multicast protocol, " Journal of Computer Security,IOS Press, 1997. Earlier version of this paper appeared inProceedings of the 9th Computer Security FoundationsWorkshop, Kenmore, Ireland (June 1996), pp. 9-17.[24] D. Malkhi and M. Reiter. A high-throughput securereliable multicast protocol. The Journal of ComputerSecurity, 5, 1997, pp 113-127.[25] P. Narasimhan, K. P. Kihlstrom, L. E. Moser, and P. M.Melliar-Smith, "Providing support for survivable CORBAapplications with the Immune system," in Proceedings of the19th IEEE International Conference on DistributedComputing Systems, (Austin, TX), pp.507--516, May 1999.[26] M. Naor, B. Pinkas and O. Reingold, “DistributedPseudo-random Functions and KDCs”, Advances in

Cryptology–EUROCRYPT ’99 (LNCS 1592), 327–346,1999.[27] C. O'Ryan, F. Kuhns, D. Schmidt, O. Othman, and J.Parsons, The Design and Performance of a PluggableProtocols Framework for Real-time Distributed ObjectComputing Middleware, in Proceedings of theIFIP/ACMMiddleware 2000 Conference, Pallisades, New York, April3-7, 2000.[28] OMG. General Inter-ORB Protocol. OMG specificationCORBA V2.3, 15:1-62, June 1999.[29] OMG. The Common Object Request Broker:Architecture and specification. Revision 2.5, 2001.[30] OMG. The CORBA Security Service. Draft AdoptedRevision 1.8, 2001.[31] B. Parhami, "Optimal Algorithms for Exact, Inexact, andApproval Voting," Digest of the 22nd InternationalSymposium on Fault-Tolerant Computing, pp. 404-411.[32] Parameswaran, R., Blough, D., and Bakken, D. “APreliminary Investigation of Precision vs. Fault ToleranceTrade-offs in Voting Algorithms”, in Digest of FastAbstractspresented at the International Conference on DependableSystems and Networks (DSN-2001), Göteborg, Sweden, July,2001.[33] Rivest, Shamir, Adleman. A Method for ObtainingDigital Signatures and Public-Key Cryptosystems.Communications of the ACM, 21(2), 1978.[34] Rivest. The MD5 Message-Digest Algorithm. InternetRFC-1321, 1992.[35] M. K. Reiter. Secure agreement protocols: Reliable andatomic group multicast in Rampart. In Proceedings of the2nd ACM Conference on Computer and CommunicationSecurity, pages 68-80, November 1994.[36] M. Reiter. The Rampart toolkit for building high-integrity services. In Theory and Practice in DistributedSystems (Lecture Notes in Computer Science 938), pages 99-110, Springer-Verlag, 1995.[37] F. Schneider, Implementing fault-tolerant services usingthe state machine approach: a tutorial, ACM ComputerSurveys , 22,4 (Dec. 1990), Pages 299 – 319.[38] D. Schmidt, A. Gokhale, T. Harrison, and G. Parulkar.“A High-performance Endsystem Architecture for Real-timeCORBA”, IEEE Communications Magazine, February,1997.[39] V. Shoup, “Practical threshold signatures”, Advances inCryptology–EUROCRYPT ’2000, 2000.[40] Rodrigo Rodrigues, Miguel Castro, and Barbara Liskov,“BASE: Using Abstraction to Improve Fault Tolerance”,Proceedings of the 18th Symposium on Operating SystemsPrinciples (SOSP '01), Banff, Canada, October 2001.[41] Pal P, Webber F, Schantz RE, and Loyall JP. IntrusionTolerant Systems. Proceedings of the IEEE InformationSurvivability Workshop (ISW-2000), 24-26 October 2000,Boston, Massachusetts.


Recommended