+ All Categories
Home > Documents > Team 6: Slackers

Team 6: Slackers

Date post: 28-Jan-2016
Category:
Upload: marie
View: 31 times
Download: 0 times
Share this document with a friend
Description:
Team 6: Slackers. 18749: Fault Tolerant Distributed Systems. Team Members Puneet Aggarwal Karim Jamal Steven Lawrance Hyunwoo Kim Tanmay Sinha. Team Members. URL: http://www.ece.cmu.edu/~ece749/teams-06/team6/. Overview. Baseline Application Baseline Architecture FT-Baseline Goals - PowerPoint PPT Presentation
Popular Tags:
69
Team 6: Slackers 18749: Fault Tolerant Distributed Systems Team Members Team Members Puneet Aggarwal Puneet Aggarwal Karim Jamal Karim Jamal Steven Lawrance Steven Lawrance Hyunwoo Kim Hyunwoo Kim Tanmay Sinha Tanmay Sinha
Transcript
Page 1: Team 6: Slackers

Team 6: SlackersTeam 6: Slackers18749: Fault Tolerant Distributed Systems18749: Fault Tolerant Distributed Systems

Team MembersTeam MembersPuneet AggarwalPuneet Aggarwal

Karim JamalKarim Jamal

Steven LawranceSteven Lawrance

Hyunwoo KimHyunwoo Kim

Tanmay SinhaTanmay Sinha

Page 2: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 22

Team MembersTeam Members

URL: http://www.ece.cmu.edu/~ece749/teams-06/team6/URL: http://www.ece.cmu.edu/~ece749/teams-06/team6/URL: http://www.ece.cmu.edu/~ece749/teams-06/team6/URL: http://www.ece.cmu.edu/~ece749/teams-06/team6/

Page 3: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 33

OverviewOverview

• Baseline Application• Baseline Architecture• FT-Baseline Goals• FT-Baseline Architecture• Fail-Over Mechanisms• Fail-Over Measurements• Fault Tolerance Experimentation• Bounded “Real Time” Fail-Over Measurements• FT-RT-Performance Strategy• Other Features• Conclusions

• Baseline Application• Baseline Architecture• FT-Baseline Goals• FT-Baseline Architecture• Fail-Over Mechanisms• Fail-Over Measurements• Fault Tolerance Experimentation• Bounded “Real Time” Fail-Over Measurements• FT-RT-Performance Strategy• Other Features• Conclusions

Page 4: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 44

Baseline ApplicationBaseline Application

Page 5: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 55

Baseline ApplicationBaseline Application

• A system that manages the information and status of multiple parking lots.

• Keeps track of how many spaces are available in the lot and at each level.

• Recommends other available lots that are nearby if the current lot is full.

• Allows drivers to enter/exit lots and move up/down levels once in a parking lot.

• A system that manages the information and status of multiple parking lots.

• Keeps track of how many spaces are available in the lot and at each level.

• Recommends other available lots that are nearby if the current lot is full.

• Allows drivers to enter/exit lots and move up/down levels once in a parking lot.

What is Park ‘n Park?What is Park ‘n Park?

Page 6: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 66

Baseline ApplicationBaseline Application

Why is it interesting?Why is it interesting?

• Easy to implement• Easy to distribute over

multiple systems• Potential of having multiple

clients• Middle-tier can be made

stateless• Hasn’t been done before in

this class• And most of all…who wants

this?

• Easy to implement• Easy to distribute over

multiple systems• Potential of having multiple

clients• Middle-tier can be made

stateless• Hasn’t been done before in

this class• And most of all…who wants

this?

Page 7: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 77

Baseline ApplicationBaseline Application

Development ToolsDevelopment Tools

• Java– Familiarity with language– Platform independence

• CORBA– Long story (to be discussed later…)

• MySQL– Familiarity with the package– Free!!!– Available on ECE cluster

• Linux, Windows, and OS X– No one has the same system nowadays

• Eclipse, Matlab, CVS, and PowerPoint– Powerful tools in their target markets

• Java– Familiarity with language– Platform independence

• CORBA– Long story (to be discussed later…)

• MySQL– Familiarity with the package– Free!!!– Available on ECE cluster

• Linux, Windows, and OS X– No one has the same system nowadays

• Eclipse, Matlab, CVS, and PowerPoint– Powerful tools in their target markets

Page 8: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 88

Baseline ApplicationBaseline Application

High-Level ComponentsHigh-Level Components• Client

– Provides an interface to interact with the user– Creates an instance of Client Manager

• Server– Manages Client Manager Factory– Handles CORBA functions

• Client Manager– Part of middle tier– Manages various client functions– Unique for each client

• Client Manager Factory– Part of middle tier– Factory for Client Manager instances

• Database – Stores the state for each client– Stores the state of the parking lots (i.e. occupancy of lots and levels,

distances to other parking lots)• Naming Service

– Allows client to obtain reference to a server

• Client– Provides an interface to interact with the user– Creates an instance of Client Manager

• Server– Manages Client Manager Factory– Handles CORBA functions

• Client Manager– Part of middle tier– Manages various client functions– Unique for each client

• Client Manager Factory– Part of middle tier– Factory for Client Manager instances

• Database – Stores the state for each client– Stores the state of the parking lots (i.e. occupancy of lots and levels,

distances to other parking lots)• Naming Service

– Allows client to obtain reference to a server

Page 9: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 99

Baseline ArchitectureBaseline Architecture

Page 10: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1010

Baseline ArchitectureBaseline Architecture

High-Level ComponentsHigh-Level Components

Server

Client

Naming Service

MiddlewareMiddleware

DatabaseDatabase3. Contact naming service

4. Create client manager instance

7. Request data

2. Register name

Client Manager Factory

Client Manager

6. Invoke service method

Processes andThreads

x y Data Flow

Legend

1. Create instance

5. Create instance

Page 11: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1111

FT-Baseline GoalsFT-Baseline Goals

Page 12: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1212

FT-Baseline GoalsFT-Baseline Goals

Main GoalsMain Goals

• Replicate the entire middle tier in order to make the system fault-tolerant. The middle tier includes– Client Manager– Client Manager Factory– Server

• No need to replicate the naming service, replication manager, and database because of added complexity and limited development time

• Maintain the stateless nature of the middle tier by storing all state in the database

• For the fault tolerant baseline application– 3 replicas of the servers on clue, chess, and go

• Naming service (boggle), Replication Manager (boggle) and Database (previously on mahjongg, now on girltalk) on the sacred servers– Have not been replicated and are single point-of-failures

• Replicate the entire middle tier in order to make the system fault-tolerant. The middle tier includes– Client Manager– Client Manager Factory– Server

• No need to replicate the naming service, replication manager, and database because of added complexity and limited development time

• Maintain the stateless nature of the middle tier by storing all state in the database

• For the fault tolerant baseline application– 3 replicas of the servers on clue, chess, and go

• Naming service (boggle), Replication Manager (boggle) and Database (previously on mahjongg, now on girltalk) on the sacred servers– Have not been replicated and are single point-of-failures

Page 13: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1313

FT-Baseline GoalsFT-Baseline Goals

FT FrameworkFT Framework

• Replication Manager– Responsible for checking liveliness of servers– Performs fault detection and recovery of servers– Can handle an arbitrary amount of server replicas– Can be restarted

• Fault Injector– kill -9– Script to periodically kill primary server

• Added in the RT-FT-Baseline implementation

• Replication Manager– Responsible for checking liveliness of servers– Performs fault detection and recovery of servers– Can handle an arbitrary amount of server replicas– Can be restarted

• Fault Injector– kill -9– Script to periodically kill primary server

• Added in the RT-FT-Baseline implementation

Page 14: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1414

FT-Baseline ArchitectureFT-Baseline Architecture

Page 15: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1515

FT-Baseline ArchitectureFT-Baseline Architecture

High Level ComponentsHigh Level Components

Server

Client

Naming Service

MiddlewareMiddleware

DatabaseDatabase4. Contact naming service

5. Create client manager instance

8. Request data

2. Register name

Client Manager Factory

Client Manager

7. Invoke service method

Processes andThreads

x y Data Flow

Legend

1. Create instance

6. Create instance

Replication Manager

3. Notify of existence

poke()

bind() / unbind()

Page 16: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1616

Fail-Over MechanismFail-Over Mechanism

Page 17: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1717

Fail-Over MechanismFail-Over Mechanism

Fault Tolerant Client ManagerFault Tolerant Client Manager

• Resides on the client side• Invokes service methods on the client

Manager on behalf of the client• Responsible for fail-over

– Detects faults by catching exceptions– If an exception is thrown during a service

call/invocation, it gets the primary server reference from the naming service and retries the failed operation using the new server reference

• Resides on the client side• Invokes service methods on the client

Manager on behalf of the client• Responsible for fail-over

– Detects faults by catching exceptions– If an exception is thrown during a service

call/invocation, it gets the primary server reference from the naming service and retries the failed operation using the new server reference

Page 18: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1818

Fail-Over MechanismFail-Over Mechanism

Replication ManagerReplication Manager

• Detects faults using method called “poke”• Maintains a dynamic list of active servers• Restarts failed/corrupted servers• Performs naming service maintenance

– Unbinds names of crashed servers– Rebinds name of primary server

• Uses the most-recently-active methodology to choose a new primary server in case the primary server experiences a fault

• Detects faults using method called “poke”• Maintains a dynamic list of active servers• Restarts failed/corrupted servers• Performs naming service maintenance

– Unbinds names of crashed servers– Rebinds name of primary server

• Uses the most-recently-active methodology to choose a new primary server in case the primary server experiences a fault

Page 19: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 1919

Fail-Over MechanismFail-Over Mechanism

The Poke MethodThe Poke Method

• “Pokes” the server periodically• Not only checks whether or not the server is

alive, but also whether the server’s database connectivity is intact or is corrupted

• Throws exceptions in case of faults (i.e. can’t connect to database)

• The replication manager handles faults accordingly

• “Pokes” the server periodically• Not only checks whether or not the server is

alive, but also whether the server’s database connectivity is intact or is corrupted

• Throws exceptions in case of faults (i.e. can’t connect to database)

• The replication manager handles faults accordingly

Page 20: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2020

Fail-Over MechanismFail-Over Mechanism

Exceptions HandledExceptions Handled

• COMM_FAILURE: CORBA exception• OBJECT_NOT_EXIST: CORBA exception• SystemException: CORBA exception• Exception: Java exception• AlreadyInLotException: Client is already in a lot• AtBottomLevelException: Car cannot move to a lower level because

it's on the bottom floor• AtTopLevelException: Car cannot move to a higher level because

it's on the top floor• InvalidClientException: ID provided by Client doesn’t match the ID

stored in the system• LotFullException: System throws exception when the lot is full• LotNotFoundException: Lot number not found in the database• NotInLotException: Client's car is not in the lot • NotOnExitLevelException: Client is not on an exit level in the lot• ServiceUnavailableException: Exception that gets thrown when an

unrecoverable database exception or some other error prevents the server from successfully completing a client-requested operation

• COMM_FAILURE: CORBA exception• OBJECT_NOT_EXIST: CORBA exception• SystemException: CORBA exception• Exception: Java exception• AlreadyInLotException: Client is already in a lot• AtBottomLevelException: Car cannot move to a lower level because

it's on the bottom floor• AtTopLevelException: Car cannot move to a higher level because

it's on the top floor• InvalidClientException: ID provided by Client doesn’t match the ID

stored in the system• LotFullException: System throws exception when the lot is full• LotNotFoundException: Lot number not found in the database• NotInLotException: Client's car is not in the lot • NotOnExitLevelException: Client is not on an exit level in the lot• ServiceUnavailableException: Exception that gets thrown when an

unrecoverable database exception or some other error prevents the server from successfully completing a client-requested operation

Page 21: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2121

Fail-Over MechanismFail-Over Mechanism

Response to ExceptionsResponse to Exceptions

• Get new server reference and then re-try the failed operation when the following exception occurs

– COMM_FAILURE– OBJECT_NOT_EXIST– ServiceUnavailableException

• Report error to user and prompt for next command when the following exceptions occur

– AlreadyInLotException– AtBottomLevelException– AtTopLevelException– LotFullException– LotNotFoundException– NotInLotException– NotOnExitLevelException

• Client terminates when the following exceptions occur– InvalidClientException– SystemException– Exception

• Get new server reference and then re-try the failed operation when the following exception occurs

– COMM_FAILURE– OBJECT_NOT_EXIST– ServiceUnavailableException

• Report error to user and prompt for next command when the following exceptions occur

– AlreadyInLotException– AtBottomLevelException– AtTopLevelException– LotFullException– LotNotFoundException– NotInLotException– NotOnExitLevelException

• Client terminates when the following exceptions occur– InvalidClientException– SystemException– Exception

Page 22: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2222

Fail-Over MechanismFail-Over Mechanism

Server ReferencesServer References

• The client obtains the reference to the primary server when– it is initially started– it notices that the server has crashed or been corrupted (i.e.

COMM_FAILURE, ServiceUnavailableException)• When the client notices that there is no primary server

reference in the naming service, it displays an appropriate message and then terminates

• The client obtains the reference to the primary server when– it is initially started– it notices that the server has crashed or been corrupted (i.e.

COMM_FAILURE, ServiceUnavailableException)• When the client notices that there is no primary server

reference in the naming service, it displays an appropriate message and then terminates

Page 23: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2323

RT-FT-Baseline ArchitectureRT-FT-Baseline Architecture

Page 24: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2424

RT-FT-Baseline ArchitectureRT-FT-Baseline Architecture

High Level ComponentsHigh Level Components

Server

Client

Naming Service

MiddlewareMiddleware

DatabaseDatabase

4. Contact naming service

5. Create client manager instance

8. Request data

2. Register name

Client Manager Factory

Client Manager 7. Invoke service method

1. Create instance

6. Create instance

Replication Manager

3. Notify of existence

poke()

bind()/unbind()

Testing Manager Processes andThreads

x y Data Flow

Legend

x y Launches

Page 25: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2525

Fault Tolerance ExperimentationFault Tolerance Experimentation

Page 26: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2626

Fault Tolerance ExperimentationFault Tolerance Experimentation

The Fault Free Run - Graph 1The Fault Free Run - Graph 1

While the mean latency stayed almost constant, the maximum latency variedWhile the mean latency stayed almost constant, the maximum latency varied

Page 27: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2727

Fault Tolerance ExperimentationFault Tolerance Experimentation

The Fault Free Run - Graph 2The Fault Free Run - Graph 2

This demonstrates the conformance with the magical 1% theoryThis demonstrates the conformance with the magical 1% theory

Page 28: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2828

Fault Tolerance ExperimentationFault Tolerance Experimentation

The Fault Free Run - Graph 3The Fault Free Run - Graph 3

Mean latency increases as the reply size increasesMean latency increases as the reply size increases

Page 29: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 2929

Fault Tolerance ExperimentationFault Tolerance Experimentation

The Fault Free Run - ConclusionsThe Fault Free Run - Conclusions

• Our data conforms to the magical 1% theory, indicating that outliers account for less than 1% of the data points

• We hope that this helps with Tudor’s research

• Our data conforms to the magical 1% theory, indicating that outliers account for less than 1% of the data points

• We hope that this helps with Tudor’s research

Page 30: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3030

Bounded “Real Time” Fail Over MeasurementsBounded “Real Time” Fail Over Measurements

Page 31: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3131

Bounded “Real-Time” Fail Over MeasurementsBounded “Real-Time” Fail Over Measurements

The Fault Induced Run - GraphThe Fault Induced Run - Graph

High latency is observed during faultsHigh latency is observed during faults

Page 32: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3232

Bounded “Real-Time” Fail Over MeasurementsBounded “Real-Time” Fail Over Measurements

The Fault Induced Run - Pie ChartThe Fault Induced Run - Pie Chart

Client’s fault recovery timeout causes most of the latencyClient’s fault recovery timeout causes most of the latency

Page 33: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3333

Bounded “Real-Time” Fail Over MeasurementsBounded “Real-Time” Fail Over Measurements

The Fault Induced Run - ConclusionsThe Fault Induced Run - Conclusions

• We noticed that there is an observable latency when a fault occurs

• Most of the latency was caused by the client’s fault recovery timeout

• The second-highest contributor was the time that the client has to wait for the client manager to be restored on the new server

• We noticed that there is an observable latency when a fault occurs

• Most of the latency was caused by the client’s fault recovery timeout

• The second-highest contributor was the time that the client has to wait for the client manager to be restored on the new server

Page 34: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3434

FT-RT-Performance StrategyFT-RT-Performance Strategy

Page 35: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3535

FT-RT-Performance StrategyFT-RT-Performance Strategy

Reducing Fail-Over TimeReducing Fail-Over Time

• Implemented strategies– Adjust client fault recovery timeout– Use IOGRs and cloning-like strategies– Pre-create TCP/IP connections to all servers

• Other strategies that could potentially be implemented– Database connection pool– Load balancing– Remove client ID consistency check

• Implemented strategies– Adjust client fault recovery timeout– Use IOGRs and cloning-like strategies– Pre-create TCP/IP connections to all servers

• Other strategies that could potentially be implemented– Database connection pool– Load balancing– Remove client ID consistency check

Page 36: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3636

Measurements after StrategiesMeasurements after Strategies

Adjusting Waiting timeAdjusting Waiting time

• The following graphs are for different values of wait time at the client end

• This is the time that the client waits in order to give the replication manager sufficient time to update the naming service with the new primary.

• The following graphs are for different values of wait time at the client end

• This is the time that the client waits in order to give the replication manager sufficient time to update the naming service with the new primary.

Page 37: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3737

Measurements after StrategiesMeasurements after Strategies

Plot for 0 waiting time Plot for 0 waiting time

Page 38: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3838

Measurements after StrategiesMeasurements after Strategies

Plot for 500ms waiting time Plot for 500ms waiting time

Page 39: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 3939

Measurements after StrategiesMeasurements after Strategies

Plot for 1000ms waiting time Plot for 1000ms waiting time

Page 40: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4040

Measurements after StrategiesMeasurements after Strategies

Plot for 2000ms waiting time Plot for 2000ms waiting time

Page 41: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4141

Measurements after StrategiesMeasurements after Strategies

Plot for 2500ms waiting time Plot for 2500ms waiting time

Page 42: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4242

Measurements after StrategiesMeasurements after Strategies

Plot for 3000ms waiting time Plot for 3000ms waiting time

Page 43: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4343

Measurements after StrategiesMeasurements after Strategies

Plot for 3500ms waiting time Plot for 3500ms waiting time

Page 44: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4444

Measurements after StrategiesMeasurements after Strategies

Plot for 4000ms waiting time Plot for 4000ms waiting time

Page 45: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4545

Measurements after StrategiesMeasurements after Strategies

Plot for 4500ms waiting time Plot for 4500ms waiting time

Page 46: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4646

Measurements after StrategiesMeasurements after Strategies

Observations after After Adjusting Wait timesObservations after After Adjusting Wait times

• The best results can be seen with 4000ms wait time. • Even though there is a lot of reduction in fail-over time

for lower values, we can observe significant amount of jitter.

• The reason for the jitter is that the client doesn’t get the updated primary from the naming service.

• Since our primary concern is bounded fail-over, we chose the strategy that has the least jitter, rather than the strategy that has the lowest latencies.

• The average recovery time is reduced by a decent amount (from about 5-6 secs to 4.5-5 sec for 4000ms wait time).

• The best results can be seen with 4000ms wait time. • Even though there is a lot of reduction in fail-over time

for lower values, we can observe significant amount of jitter.

• The reason for the jitter is that the client doesn’t get the updated primary from the naming service.

• Since our primary concern is bounded fail-over, we chose the strategy that has the least jitter, rather than the strategy that has the lowest latencies.

• The average recovery time is reduced by a decent amount (from about 5-6 secs to 4.5-5 sec for 4000ms wait time).

Page 47: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4747

Measurements after StrategiesMeasurements after Strategies

Implementing IOGRImplementing IOGR

• Interoperable Object Group Reference• In this, the client gets the list of all active servers from

the naming service• The client refreshes this list if all the servers in the list

have failed • The following graphs were produced after this strategy

was implemented

• Interoperable Object Group Reference• In this, the client gets the list of all active servers from

the naming service• The client refreshes this list if all the servers in the list

have failed • The following graphs were produced after this strategy

was implemented

Page 48: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4848

Measurements after StrategiesMeasurements after Strategies

<<COMMENTS>>

Plot after IOGR strategy (same axis)Plot after IOGR strategy (same axis)

Page 49: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 4949

Measurements after StrategiesMeasurements after Strategies

Plot after IOGR strategy (different axis)Plot after IOGR strategy (different axis)

Page 50: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5050

Measurements after StrategiesMeasurements after Strategies

Pie Chart after IOGR strategyPie Chart after IOGR strategy

Page 51: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5151

Measurements after StrategiesMeasurements after Strategies

Observations after IOGR StrategyObservations after IOGR Strategy

• The recovery time is significantly reduced, from between 5-6 seconds to less than half a second

• The time to get the new primary from the naming service is eliminated

• Most of the time is spent in obtaining an object of client manager

• The graph that is plotted on the different axis shows some amount of jitter, since, when all the servers in the client’s list are dead, then the client will have to go to the naming service

• The recovery time is significantly reduced, from between 5-6 seconds to less than half a second

• The time to get the new primary from the naming service is eliminated

• Most of the time is spent in obtaining an object of client manager

• The graph that is plotted on the different axis shows some amount of jitter, since, when all the servers in the client’s list are dead, then the client will have to go to the naming service

Page 52: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5252

Measurements after StrategiesMeasurements after Strategies

Implementing Open TCP/IP ConnectionsImplementing Open TCP/IP Connections

• This strategy was implemented since, after implementing the IOGR strategy, most of the time was spent in establishing a connection with the next server and getting the client manager

• In this, the client maintains the open TCP/IP connections with all the servers

• So the time to create a connection is saved • The following graphs were produced after the open

TCP/IP connections strategy was implemented

• This strategy was implemented since, after implementing the IOGR strategy, most of the time was spent in establishing a connection with the next server and getting the client manager

• In this, the client maintains the open TCP/IP connections with all the servers

• So the time to create a connection is saved • The following graphs were produced after the open

TCP/IP connections strategy was implemented

Page 53: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5353

Measurements after StrategiesMeasurements after Strategies

Plot after maintaining opening connections (same axis, 1 client)Plot after maintaining opening connections (same axis, 1 client)

Page 54: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5454

Measurements after StrategiesMeasurements after Strategies

Plot after maintaining opening connections (different axis, 1 client)Plot after maintaining opening connections (different axis, 1 client)

Page 55: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5555

Measurements after StrategiesMeasurements after Strategies

Pie Chart after maintaining opening connections (1 Client)Pie Chart after maintaining opening connections (1 Client)

Page 56: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5656

Measurements after StrategiesMeasurements after Strategies

Plot after maintaining opening connections (same axis, 10 clients)Plot after maintaining opening connections (same axis, 10 clients)

Page 57: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5757

Measurements after StrategiesMeasurements after Strategies

Plot after opening connections (different axis, 10 clients)Plot after opening connections (different axis, 10 clients)

Page 58: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5858

Measurements after StrategiesMeasurements after Strategies

Pie Chart after Opening Connections (10 Clients)Pie Chart after Opening Connections (10 Clients)

Page 59: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 5959

Measurements after StrategiesMeasurements after Strategies

Observations after implementation of open connections for 1 clientObservations after implementation of open connections for 1 client

• The recovery time is reduced compared to the cloning strategy

• Maximum time taken in is still in obtaining an object of client manager

• There is noticeable jitter when observed from different axis

• The recovery time is reduced compared to the cloning strategy

• Maximum time taken in is still in obtaining an object of client manager

• There is noticeable jitter when observed from different axis

Page 60: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6060

Measurements after StrategiesMeasurements after Strategies

Observations after implementation of open connections for 10 clientsObservations after implementation of open connections for 10 clients

• Significant reduction is observed in fail-over time • Maximum time taken in is still in obtaining an object of

client manager• It can also be observed that a significant amount of time

is taken in waiting for acquiring a lock on the thread

• Significant reduction is observed in fail-over time • Maximum time taken in is still in obtaining an object of

client manager• It can also be observed that a significant amount of time

is taken in waiting for acquiring a lock on the thread

Page 61: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6161

Other FeaturesOther Features

Page 62: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6262

Other FeaturesOther Features

The Long Story - EJB 3.0The Long Story - EJB 3.0

• It’s actually not that long of a story…we tried to use EJB 3.0 and failed miserably. The End.

• The main issues with using EJB 3.0 are– It is a new technology, so documentation on

it is very sparse– It is still evolving and changing, which can

cause problems (e.g. JBoss 4.0.3 vs 4.0.4)– The development and deployment is

significantly different from EJB 2.1, which introduces a new learning curve

– It is not something that can be learned in one weekend…

• It’s actually not that long of a story…we tried to use EJB 3.0 and failed miserably. The End.

• The main issues with using EJB 3.0 are– It is a new technology, so documentation on

it is very sparse– It is still evolving and changing, which can

cause problems (e.g. JBoss 4.0.3 vs 4.0.4)– The development and deployment is

significantly different from EJB 2.1, which introduces a new learning curve

– It is not something that can be learned in one weekend…

Page 63: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6363

Other FeaturesOther Features

Bells and WhistlesBells and Whistles• Replication manager can be restarted• Replication manager can handle an arbitrary number of servers• Any server can be dynamically added and removed due to no

hard-coding• Cars can magically teleport in and out of parking lots (for testing

robustness)• Clients can manually corrupt the server’s database connection

(for testing robustness)• Use the Java Reflection API in the client to consolidate fault

detection and recovery code• Prevents Sun’s CORBA implementation from spewing exception

stack traces to the user• Highly-modularized dependency structure in the code (as proved

by Lattix LDM)• Other stuff that we can’t remember …

• Replication manager can be restarted• Replication manager can handle an arbitrary number of servers• Any server can be dynamically added and removed due to no

hard-coding• Cars can magically teleport in and out of parking lots (for testing

robustness)• Clients can manually corrupt the server’s database connection

(for testing robustness)• Use the Java Reflection API in the client to consolidate fault

detection and recovery code• Prevents Sun’s CORBA implementation from spewing exception

stack traces to the user• Highly-modularized dependency structure in the code (as proved

by Lattix LDM)• Other stuff that we can’t remember …

Page 64: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6464

Other FeaturesOther Features

Lessons LearnedLessons Learned• It’s difficult to implement real-time, fault tolerance,

and high performance, especially if it is not factored into the architecture from the start

• Choose an application that will permit you to easily apply the concepts learned in the class

• Don’t waste time with bells and whistles until you have time to do so

• Run your measurements before other teams hog and crash the server

• Set up your own database server• Kill the server such that logs are flushed before the

server dies• Catch and handle as many exceptions as possible• It’s a good thing that we did not use JBoss! • Use the girltalk server because no one else is going to

use that one …

• It’s difficult to implement real-time, fault tolerance, and high performance, especially if it is not factored into the architecture from the start

• Choose an application that will permit you to easily apply the concepts learned in the class

• Don’t waste time with bells and whistles until you have time to do so

• Run your measurements before other teams hog and crash the server

• Set up your own database server• Kill the server such that logs are flushed before the

server dies• Catch and handle as many exceptions as possible• It’s a good thing that we did not use JBoss! • Use the girltalk server because no one else is going to

use that one …

Page 65: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6565

Other FeaturesOther Features

… Painful Lessons Learned… Painful Lessons Learned

• Most painful lessons learned:1. The EJB concept takes time to learn and use2. EJB 3.0 introduces another learning curve3. JBoss provides many, many configuration options, which makes

deploying an application a challenging task4 – 10. Don’t try to learn the concepts of EJB…

…and EJB 3.0……and JBoss……all at the same time……in one weekend……especially when the project is due the following Monday……!!!!!!!!!!!!!!!!!!!!

• Most painful lessons learned:1. The EJB concept takes time to learn and use2. EJB 3.0 introduces another learning curve3. JBoss provides many, many configuration options, which makes

deploying an application a challenging task4 – 10. Don’t try to learn the concepts of EJB…

…and EJB 3.0……and JBoss……all at the same time……in one weekend……especially when the project is due the following Monday……!!!!!!!!!!!!!!!!!!!!

Page 66: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6666

ConclusionsConclusions

Page 67: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6767

ConclusionsConclusions

If we had the Time Turner!!If we had the Time Turner!!

• We would start right away with CORBA, (100+ hours were a little too much)

• And we just found out… … would have counted the number of

invocations in the experiments before submitting

• We would start right away with CORBA, (100+ hours were a little too much)

• And we just found out… … would have counted the number of

invocations in the experiments before submitting

Page 68: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6868

ConclusionsConclusions

…the final word.…the final word.

Yeayyyyyyyyyyyyyyyyyyyyyyyyyy!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Yeayyyyyyyyyyyyyyyyyyyyyyyyyy!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Page 69: Team 6: Slackers

Team Slackers - Park 'n ParkTeam Slackers - Park 'n Park 6969

Thank You.Thank You.


Recommended