+ All Categories
Home > Documents > ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

Date post: 01-Jan-2016
Category:
Upload: georgina-carter
View: 228 times
Download: 2 times
Share this document with a friend
30
ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University
Transcript
Page 1: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

ISIS2 RUNTIME PARAMETERS

Ken Birman

1

Cornell University

Page 2: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

2

Parameters

Many features of Isis2 depend on parameters you can modify to “shape” the behavior of the platform. They give you very fine control over

behavior of Isis2

There are three main categories of parameters1. Those that determine how the system will

start up2. Those that determine how it sends

messages3. Those that control limits, timeouts and

other bounds

Page 3: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

3

What happens when you call IsisSystem.Start()?

Startup Parameters

Page 4: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

4

How IsisSystem.Start() works1. The library initializes itself and determines the

IP address of “local host.” If the host has several IP addresses, it picks the last of the IPv4 addresses

2. The system scans the “environment” variables to read values of the parameters. These will override the default values compiled into Isis2

1. In Linux/bash, use “export” to set them, either in .bashrc or in a shell script. Or call setenv(2)

2. In Windows, use the “set” command, or call Environment.SetEnvironmentVariable("something", somevalue);

Page 5: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

5

How IsisSystem.Start() works1. Next, the system decides which network

interfaces it should use (all of them, unless you tell it otherwise by setting ISIS_NETWORK_INTERFACES)

1. Do this if you expect to run on machines that have a “production” network and a “management” network

2. Otherwise leave ISIS_NETWORK_INTERFACES alone

2. Having done this, it attempts to contact the ORACLE

1. If the ORACLE isn’t found, it restarts the ORACLE2. Otherwise, it asks the ORACLE to let it join the

ISISMEMBERS system group

Page 6: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

6

Logging

Normally, upon restart, Isis2 creates a log file for messages printed by the library You can inhibit this by setting ISIS_MUTE=true You can also direct that messages be echoed to the

Debug stream rather than the Console when calling IsisSystem.Start()

If you allow logging and want to write to the log, call IsisSystem.Write() or IsisSystem.WriteLine() Output goes to the log plus to Console, or Debug

stream

Page 7: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

7

Fast start: But there can only be one…

For extreme speed, you can tell Isis2 not to hunt for the ORACLE (by specifying an argument to IsisSystem.Start) It will restart instantly. But if you

launch two instances this way, they won’t communicate with one-another.

So… do this only in the first instance that you launch

Page 8: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

8

Overwhelming the Membership Oracle

If processes start one by one, no issue….

But what if you try to start 50 at once, or 500?

Oracle

Hello?

Welcome!

Oracle

Page 9: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

9

Master/Worker

If a system will be big, launching hundreds of members can overload the ORACLE.

Better performance: add many all at the same time In this case use the Master/Worker pattern Master starts first, collects a list of the workers Workers start after the master and register with it Then Master can add a batch of workers to the

system, and to any groups that are desired

Page 10: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

10

Master: Accumulates workers, tells them what to do

static void beMaster(string[] args) { IsisSystem.Start(); Semaphore waitForWorkers = new Semaphore(0,1); bool fullyStaffed = false List<Address> myWorkers = new List<Address>(); IsisSystem.RegisterAsMaster((NewWorker)delegate(Address worker) { lock (myWorkers) if (fullyStaffed) IsisSystem.RejectWorker(worker); else { myWorkers.Add(worker); if(myWorkers.Count() == GOAL) { fullyStaffed = true; waitForWorkers.Release(1); } } }); waitForWorkers.WaitOne(); IsisSystem.BatchStart(myWorkers);

// This delays until they have all finished their batch start IsisSystem.WaitForWorkerSetup(myWorkers); Group.MultiJoin(myWorkers, new Group[] { myGroup });

// In front of this next line do whatever you want this application to do IsisSystem.WaitForever(); 

// If the master shuts down, its workers will too IsisSystem.Shutdown(); }

Accumulate workers

Main thread waits until enough workers have connected, then starts them all at once…… Then adds them all to groups we may want to use

Page 11: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

11

RunAsWorker: Let Master run the show

static void beWorker(string[] args) { // This next line assumes that argument 0 is the master's Address // You can also use new Address(mastersHost, 0) if you know the host IP // address of the master but don’t know the master’s pid. IsisSystem.RunAsWorker(args[0]);  // This line blocks until the master issues the BatchStart() call // Notice that in this one special case we call it AFTER RunAsWorker! IsisSystem.Start();  // Before calling this next line do whatever setup this worker must do: // create your group handles and register callbacks – but don’t call Join // For example, you might call g = new Group(“something”), then call // g.ViewHandlers += myViewHandler; … etc – anything needed to have the // group ready for a Join. But you call SetUp done INSTEAD of g.Join(). IsisSystem.WorkerSetupDone();  // Now, for each group the Master created using a multijoin, you wait // for its first view to be reported. This is one way to do that: foreach (Group g in myGroups) while (!g.HasFirstView) Thread.Sleep(250);    // WaitForever would freeze the main thread but if the worker has joined // groups (or gets added to groups by the master using MultiJoin(), the // worker could be quite active, receiving messages, sending them, etc) IsisSystem.WaitForever();  // If the master shuts down the worker will throw an // IsisException("master termination"); // If this next line actually executes, this particular worker will exit // (in effect, this worker is a normal Isis application by now, except that // if the master terminates, it does too. In particular, it can // deliberately chose to leave the system if it wishes to do so IsisSystem.Shutdown(); }

Page 12: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

12

Master/Worker Timeline

Worker MasterOracle

IsisSystem.RunAsWorker(mAddress);IsisSystem.Start();

Reached goalIsisSystem.BatchStart(myWorkers)

;

IsisSystem.Start();

. . . Accumulate workers

Group g = new Group(“myGroup”);. . . Attach handlers for g, but don’t call Join

IsisSystem.WorkerSetupDone();

IsisSystem.WaitForever(); Setup done for all workers

IsisSystem.WaitForWorkerSetup(myWorkers);

Group.MultiJoin(myWorkers, new Group[] { myGroup });

IsisSystem.WaitForever();

Group myGroup = new Group(“myGroup”);. . . Attach handlers for myGroup, thenmyGroup.Join();

foreach (Group g in myGroups) while (!g.HasFirstView) Thread.Sleep(250);

New view

Page 13: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

13

Why does this help?

Workers only send one message to Master Hence it experiences less load

It adds them all at once, first to the system, then to whatever groups the application will use Hence only one group view needs to be

sent, and it can be sent efficiently, using a broadcast

Overall load is much reduced

Page 14: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

14

How to control what internet protocols Isis2 uses

Messaging Parameters

Page 15: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

15

IP multicast / ISIS_UNICAST_ONLY Isis2 will broadcast to find the ORACLE unless

you tell it not to do so. Default: OK to use IP multicast, UDP, broadcast ISIS_UNICAST_ONLY: don’t use IP multicast. Still

requires UDP (older ISIS_TCP_ONLY feature was eliminated starting in Isis v2.1)

You must list the machines on which Isis2 ORACLE will run if you put the system in ISIS_UNICAST_ONLY mode. ISIS_HOSTS=“…”

Page 16: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

16

Normal versus UNICAST_ONLY With normal IP multicast packets are still sent

directly

With ISIS_UNICAST_ONLY, packets travel on a tree of point-to-point links and must be forwarded, perhaps log2(N) times

IP multicast Unicast tree: power of 2 “reach”

Page 17: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

17

ISIS_HOSTS

Idea is to list the places where the ORACLE can run

ISIS_HOSTS=c1.cs.cornell.edu,c2.cs.cornell.edu … orISIS_HOSTS=192.167.54.133,192.167.54.134

Processes running on other machines can join the system but can’t restart it from scratch

Page 18: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

18

ISIS_HOSTS: numerical is best! We have seen bugs in the Linux DNS when

accessed from Mono. Sometimes it hangs To avoid this, use fully numerical IP addresses when

you set the values in ISIS_HOSTS Use the IPv4 addresses for the machines on which

you want the ORACLE to run. In this case DNS never hangs

The “ping” and “traceroute” commands are examples of ways you can look these up.

On Windows, string names are fine. On Linux, they work, but don’t put the DNS under heavy load.

Page 19: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

19

ISIS_PORTp

The system uses two standard IP ports ISIS_PORTp: for p2p messages ISIS_PORTa: Set to ISIS_PORTp+1, for

acks/nacks

These ports should not be blocked by your firewall On Linux, also check iptables, which is like

a firewall

If two instances of Isis2 use non-overlapping port ranges, they will not notice one-another.

Page 20: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

20

ISIS_MAXIPMCADDRS

When permitted to use IP multicast, Isis2 tries not to overuse that feature: ISIS_MCRANGE_LOW: low-end of the IPMC address

range Isis2 should use. By default, CLASSD+5000, where CLASSD is 244.0.0.0/8

ISIS_MCRANGE_HIGH: high-end of the IPMC range ISIS_MAXIPMCADDRS: limit on how many multicast

addresses Isis2 can use, system-wide. It is perfectly reasonable to set this to a small number, like 5 or 10. The system should work if ISIS_MAXIPMCADDRS2.

If ISIS_UNICAST_ONLY is true, then no IPMC addresses are used at all.

Page 21: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

21

ISIS_TTL

Broadcast and multicast messages are automatically relayed by routers Each “hop” causes the “time to live” field in

the message to be decremented If the TTL reaches zero, the router drops the

packet Isis2 initializes the TTL value using

ISIS_TTL. You can set this to 0 or 1 to confine the

system to a single segment of your network.

Page 22: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

22

ISIS_MAXMSGLEN

Automatically adjusted but you can provide a recommended value if you wish Isis2 will override the value in some

situations Normally not something you would need to

modify

If a message is too large, Isis2 will automatically fragment it and reassemble it prior to delivery

Page 23: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

23

These are less often changed

Other limits and timeouts

Page 24: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

24

ISIS_DEFAULTTIMEOUT

Normally 45secs. OK to reduce if you wish. Failure detection needs twice this long, hence 90s. This applies if you kill a process “suddenly” (e.g. ^C) or

if the machine on which it was running crashes 45s is very slow, but on cloud computing systems long

delays happen more often than you would expect! On lightly loaded clusters, you can set

ISIS_DEFAULTTIMEOUT much lower, but not less than 2s.

If you design a failure sensing solution of your own, call Isis.ProcessFailed(who) to tell us if a process crashes.

Page 25: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

25

Help! I’ve been poisoned!

If a process throws this exception, it means that some other process thought it had failed If a dead process reappears, live members

send it a “you have been poisoned” message

Prevents system partitioning

Rule in Isis2: Only allow a single partition to remain alive at one time. If a partition forms, immediately shut one side down (the side lacking a majority)

Page 26: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

26

Speeding up failure detection If a process will exit (rather than crash),

call IsisSystem.Shutdown() first. This rapidly announces the departure and

the process will immediately be removed from groups it belongs to

Like a fast failure notification – as if it said “bye!”

You can also eliminate a group rapidly (without killing its members) using g.Terminate()

Page 27: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

27

Hints for EC2 users

On EC2 we recommend using ISIS_UNICAST_ONLY

EC2 gives you a “virtual cluster” with nodes numbered from IP address xxx.xxx.xxx.0. You can use this range to set ISIS_HOSTS even before launching your application

If you use the Master/Worker startup mode, you can tell the system the master is at: new Address(xxx.xxx.xxx.0, 0);

This works because the master will run on node xxx.xxx.xxx.0 (due to ISIS_HOSTS) and the pid is ignored in the BeWorker call, so using 0 is fine.

Page 28: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

28

How can it be done?

Debugging Isis2 issues

Page 29: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

29

Debugging is hard…

… debugging distributed systems even harder

Useful tools Visual studio. Keep in mind that even an exception

thrown inside Isis2 could be caused by a mistake in your code. All those upcalls will be issued from Isis2 stacks!

You can call IsisSystem.GetState() to obtain a string representing the state of the Isis system itself. But you’ll need help from Cornell experts to understand this data.

You can call IsisSystem.RunTimeStatsState() to obtain a self-explanatory string with counts of messages sent and received. The data itself is in IsisSystem.RTS, and you can access this at runtime.

Page 30: ISIS 2 RUNTIME PARAMETERS Ken Birman 1 Cornell University.

30

Suggestions

Isis2 is multithreaded. So write thread-safe code.

Don’t block during upcalls from Isis2 into your code. The library assumes that upcalls will complete quickly and could malfunction otherwise.

Isis2 has a lot of threads. Don’t let this worry you.

We gave you the source code. If you notice a bug, post it to isis2.codeplex.com on the “issues” page

Post questions on the codeplex “discussions” page


Recommended