An introduction to the Globus toolkit.doc

Introduction to the Globus Toolkit 1 Russell Lock

An introduction to the Globus toolkit

Russell Lock

11 February 2002

AbstractThis report gives the reader an overview of the Globus middleware platform,

emphasizing its uses, and the extent to which it provides the services required for a grid architecture. It then describes the various components that go into making the Globus Toolkit and its future as a grid building tool.


Contents Page Page

1 Introduction 31.1 The Globus toolkit 41.2 Globus and the real world 4

1.3 Requirements of Grid based systems 5

2 Basic Globus Concepts 62.1 Jobs 62.2 GRAM 82.3 An overview of Globus components 10

3 GASS 11

4 MDS 124.1 GRIS 124.2 GIIS 12

5 HBM 13

6 GSI 146.1 Grid Security Overview 146.2 Globus Grid-map file 156.3 Encryption 16

6.3.1 The basic principle of encryption 166.3.2 Globus and encryption 176.3.3 Public key encryption 17

6.4 X.509 certificates and CA’s 186.5 Mutual Authentication 196.6 Other security considerations 20

6.6.1 Proxies 216.6.2 Specific GASS server issues 21

6.7 Overall Issues with the globus security model 22

7 Running a simple Job 23

8 Extending Globus (Schedulers) 25

9 Conclusions 25

10 Glossary 26

11 References 27

12 Appendix 28


1 Introduction The Globus[1] toolkit is designed to enable people to create computational grids. It has been developed over several years chiefly at the Argonne National Laboratory Illinois USA. Globus is an open source initiative aimed at creating new grids capable of the scale of computing seen only in supercomputers up to now. As an open source project any person can download the software, examine it, install it and hopefully improve it. By this constant stream of comments and improvements, new versions of the software can be developed with increased functionality and reliability. In this way the Globus project itself will be on going with constant evolution of the toolkit.

Grid computing has been an active research area for several years and several systems exist that utilize functional computational grids. The most notable of these is the NASA Information Power Grid[3] (run on the Globus toolkit) and the new grid being constructed for analyzing data from the Large Hadron Collider project at Cern[4] when it becomes operational. So far the leading grid middleware system has been the Globus Toolkit. The introduction of computational grids has given developers a considerable number of extra problems to overcome in order to make them work correctly and reliably.

Computational grids are designed to make better utilization of resources on people’s computers. They do this by harnessing the resources not being used at the time, to work on problems elsewhere. The mode this takes could be harvesting the process cycles of peoples machines or disk space etc. Primarily research at this stage is in creating computational grids for university departments and for small scale businesses, which may need the use of a supercomputer but cannot afford the price, or the space in which to house one. It is also helping to break down barriers on how these resources are accessed by trying to build an architecture where any machine could potentially control a virtual supercomputer at any given time.

The average desktop machine in an office is only utilized at 5% of its possible processing power. These revelations have led to an upsurge in interest for methods of using this perceived waste of resources for a better purpose. It should be stressed that some of the earlier attempts to put this power to a good use are not necessarily grid architectures in the strictest sense. For example the SETI program[2] makes use of home and business machines with their seti@home computations. These have largely been successful ventures but apply a rigid definition on who submits jobs. In the case of seti@home the control comes from the central server only. The Globus toolkit allows selected users to gain access to these unused resources via computer networking, potentially making every computer a request sender.


1.1 The Globus toolkit

The Globus toolkit itself is made from a number of components. The design of the toolkit itself is very modular and has been developed in this way to make alterations and improvements easier, with less impact on connected components. The toolkit is written in the C programming language and the source is available for download. It is designed to work on a number of platforms, predominantly that of Linux but with limited support for Microsoft promised in the future. Globus has also recently been ported for the Solaris platform. So far Globus has been a lead contender in the development of grid computing and is currently the only major effort with open source availability. The Toolkit itself is designed to work in research environments, predominantly as an impetus to be redesigned and improved upon; however in theory any company could install it and use it as a computing tool.

1.2 Globus and the real world

Many people suffer from a misconception of why the Globus toolkit exists and what people can realistically expect to achieve with it. The Toolkit has been developed to provide a basic level of functionality for grid based systems. The main problem stems from the fact that the system is of such a low level that it could never be considered a complete solution for a business or other major venture.

Reading the latest news articles and public announcements people could easily believe that the era of grid computing is here right now, and that everyone either should be using grids or will be in a few years time. Every week it seems one of the tabloid newspapers is publicizing the new successor to the internet, that of grid computing. Very few understand the term however and widespread confusion seems to be commonplace on the issue.

In order to try to clear up the misunderstandings shown in the press for the Globus initiative the following analogy is put forward. Using the Toolkit by itself is analogous to driving a number of cars simultaneously. In theory you could do this, but it would require an unfeasible amount of attention to make it work. This is what Globus provides. It allows complete control of a grid environment, but at the bare Toolkit level this would mean the user making every decision. For example in a finished system if you were sending off a job to a machine it would be nice to know in advance whether the machine was busy or not. If it were, you may wish to change your mind and send the request to a different one. Globus supports this but it does not give you the information unless a) you ask explicitly for it, b) you have started the requisite services it needs beforehand.

The fact that the user is prompted to set things running, and to decide every decision that needs making does not make the system flawed in any way. Many people who install the Toolkit will find it difficult to use, this is not through bad design but from the inherent fact that Globus was designed to be implemented and built upon by others.

The Globus architecture has become the ipso facto standard for Grid computing and has provided the basic building blocks to do most of the things you would wish to when building a grid. Many companies are working on products to operate commercial grid


solutions based on the Globus architecture. Therefore though the Globus initiative is open source, the solutions used in your department or office in the next ten years will not be the basic Globus toolkit, but a properly supported, debugged (to a relative extent) version installed by people trained to do so. At its roots however will be the same basic architecture that will be shown over the next few sections.

1.3 Requirements of Grid based systems

Fundamentally a number of services are needed in order to build a working grid based system. Because this issue could span a report all of its own, only consideration to major areas is given:

You need to be able to allow a person on a given machine to submit a job to a machine elsewhere.

You require a way to monitor a jobs progress and to give you back the results if necessary.

The user needs to be able to find out details about different machines, to make an informed choice as to what machine to place a given job on. At a basic level this means some sort of resource discovery, at a more advanced level the load on a given machine, its memory capacity etc.

You need to ensure that if any part of the system fails it can recover.

You need to be able to support jobs that require resources at their execution machine over and above that of the actual program itself.

You need to secure the whole package so that only the people who should get in do get in.

Now that the many facets of a typical grid have been listed it would seem prudent that Globus was split into a number of components. Before delving into how the various parts work; an explanation of some of the more basic services is described in the next section, then a basic diagram is shown in section 3.3 showing the names of the various components in Globus, and a key to show what their role is in the system.


2 Basic Globus Concepts In order to understand the way in which the Globus toolkit interacts with the various services, it is essential that the reader understands the basic principles behind what the system is actually sending between machines, and what happens when it gets there. These points are shown in the section below. Section 2.3 then shows an overall view of the Globus system with the following chapters explaining the components in more detail.

2.1 Jobs

For the purposes of the Globus system a job can be defined as a program(s) that a user wishes to execute on a known remote machine. The job itself cannot however just be transmitted to the remote machine in question. The reason for this being that the job has to be validated, and any additional resources needed for the job noted by the remote machine. Therefore a job is sent with a job request which can specify a number of things, the main of which are listed below.

Name of program(s) to submit

Machine(s) to submit to

Method of result retrieval * (Has default)

Access to files required *

Maximum execution time *

Minimum/Maximum memory *

Optional = *

Globus supports a variety of ways of retrieving results once a job has completed. The default being that results should be sent back to the screen of the user who sent the request. However a number of alternatives are available, some of which are listed below.

Send to screen at local machine

Send to file at local machine

Store in file at remote ftp / http server

Wait till retrieve command is given from local machine

Don’t do anything with results


The request itself can be made in one of two ways. The first and most simple is using the globus-job-run. This method only allows a user to send off one request at a time and is used only to cover simple cases for execution. For example the example below shows a simple request.

globus-job-run egcsky000015.lancs.ac.uk /bin/echo "hello"

To make things easier to understand the above example has been spaced out a little. This statement would be typed in from the basic command prompt of the submitting machine. The first statement is the name of the request program to run, in this case globus-job-run. The second statement lists the full name of the machine that the user wishes to execute the program on. The third statement lists the program to execute. In this case the installation of globus was on a Linux system, /bin/echo is the name of a simple linux command. This illustrates that the program itself can be any acceptable executable. The echo command simply takes an argument and prints it back to the screen. The argument in this example is shown in the fourth statement. So to summarize, this request will send the echo command to the remote machine, execute it and return the results. The results are returned automatically by default though this can be altered in the request. For the purpose of this example the results will simply appear on the local user’s terminal screen, the same one that they sent the request from.

The second way in which the request can be made is by using a more powerful request system called globus-run. Unlike globus-job-run, it is designed to allow multiple requests to be sent to multiple machines if necessary in one action. In order to make this as straight forward as possible the program makes use of a specific language to formulate the request. The request is written in a RSL (Resource Specification Language). The Globus RSL is quite complex in design, and can be difficult to understand at first. The syntax of the Globus RSL is beyond the scope of this report as it would take many pages to get across even the basic syntax. It is unlikely that most people would need to use this more advanced version of request program manually. It will be of more use once the syntax itself can be automated so that actual users do not come into contact with it.

Preferably the program itself needs to be written in the C programming language. Though in theory any binary executable could be used, if it needs to interact with Globus it needs to be able to use Globus’s API’s written for the C programming language. This can be achieved from most programming languages but the implementation can cause problems. C++ is supposedly compatible with C but Globus does not guarantee complete compatibility. An example whereby a program may wish to interact with Globus could be one where access to additional files was needed. It is also notable that support for interactive jobs is still incomplete. The idea of an interactive job is one which requires user direction at some point. A number a Globus’s competitors, the most noticeable being Sun[5] have already dealt with this issue. It is important to note that although Globus provides no active support for interactive programs, it does not stop the program from contacting another machine if it sets up the relevant sockets itself. Due to the useful nature of this facility more active support for it will no doubt be added in the future


though. This will make the creation of viable programs far simpler and more standardized.

2.2 GRAM

GRAM is the name given to the Globus Resource Allocation Manager. Its job is to manage job requests and to execute and monitor them on remote machines. GRAM is in overall control of all the other services listed in the sections below. It is responsible for setting up and taking down services provided by Globus when it believes a user needs them. It also acts as the manager that all information about requests or status reports feed back into.

One of the main part’s of GRAM is the gatekeeper. In order to execute a given program on another machine a number of checks needs to be made. Therefore you cannot simply take complete control of a remote machine and run the program in question. The interaction between machines has to be carefully monitored. The reasoning for this stems from the unknown nature of people joining grid networks. The whole concept of grids is that of machines coming and going, that they can join dynamically when they wish. In order to monitor interaction at a remote machine a gatekeeper is run on the execution machine. It is constantly waiting for new requests to come in. The gatekeeper waits on a defined standardized port, this allows users to contact remote machines knowing only that machine’s IP address. It is vital that as little information as possible is needed to communicate between machines to enhance the usability of the system.

The gatekeeper also ensures that only valid requests are accepted and dutifully sends back results when they are produced. If a machine is already running a job it will wait till the current job has completed before running a new one. The main reason for this is the size of the jobs, which would normally prohibit more than one running at once on a resource. It also helps reduce the chance of overloading individual machines with requests from different Jobs at a given time. Generally gatekeepers are used to allow execution on remote machines from a local one. However if for some reason you wish to execute requests on the local machine yourself you can set up a personal gatekeeper which will do just this function. This may seem a somewhat bizarre notion but the main application for this type of execution is for testing the setting up of Globus components on a given machine. The overall mechanism for Jobs and Gatekeepers is shown in Fig 1.


2.3 An overview of Globus components

Due to the complex way in which the different services interact an overall view of the system is shown on the next page. Though a reader will undoubtedly not understand all the services involved at this point, it does give an overall feel for the system. This should enable the reader to understand how the key services fit in as they read through the sections.

Request sent for execution

Request B waits

Request A runs

B sends request after

A sends request first

C Sets up gatekeeper

Machine A

Machine B

Machine C

Gatekeeper

Fig 1


An overview of Globus components

File Access

Information Requests / updates

Information Requests / updates

Machine A(Globus)

GSI Gatekeeper GSI

Machine B(Globus)

GIIS

GASS

Job Request

Job Results

GRIS

GRIS

GRIS

GRIS

Key:

GSI : The GSI controls all the security arrangements (authorization etc)MDS : The MDS is in charge of distributing information about machinesGatekeeper : The gatekeeper controls interactions between machine and the jobsGASS : The GASS server can be set up to provide access to other files during jobsGRAM : Globus Resource Allocation Manager

MDS

GRAM


3 GASS The Globus system is used for more than simple executable programs. Many of the jobs it runs require additional resources in order to run. For example the transferring of data files from a remote machine to the place where the job is executing, so that the job can access them. The way in which this is done is quite simple, though some of the security precautions involved are not. If a job requires additional resources they will be listed in the job request sent to the execution machine. On receiving that request the gatekeeper will examine the request and, if it determines that additional resources are required, it will set up a GASS server on that machine to deal with any requests the program will make.

The GASS server itself allows users to put files in a local cache accessible to the job running at the time. Any file can be transferred as long as it resides on an FTP or HTTP server on an accessible network. Allowing a user to access files requires a user to be authorized to access the remote resource. Therefore in order to comply with requests for information the GASS server has to make use of the GSI to make sure the machines involved can be authenticated. At present it is not possible for any jobs to share the same cache. This would obviously lead to security problems as only certain people should have access to a given file. As such once the program that is running finishes, the GASS server will delete the cache and shut down the server. It is important to note that the use of GASS and the cache is the only way to access files for use in execution. Local files at present cannot be accessed in any other way.

As with most programs there are ways to make life easier. Many grid enabled applications make use of huge datasets which would be somewhat cumbersome to copy into the cache. In this case those files could be stored in the cache permanently if necessary. The GASS server will only delete what it has placed into the cache when it shuts down. It will only delete the cache directory itself if it is empty. This works because the GASS sever will always set up the cache in the same place unless instructed by the resource owner differently. The overall interaction for this part of the system is shown in Fig 2.

File Requests

Job Executes

Gatekeeper starts GASS

GASS Accesses cache

Machine A

Machine B

GASS

Cache

FTP Server

B GatekeeperA sends a request to B

Fig 2


4 MDS The Meta Data Service controls all information pertaining to the different machines on a grid. It holds information of both dynamic and static nature. Examples of the sort of things held could include machine ID, average load, memory capacity etc. Though the MDS is designed to hold various types of information it is extensively modifiable. This helps make Globus easier for developers to build applications for. For example to build an effective scheduler on top of Globus it may be helpful for a request about a machine to indicate who has authorization to use it.

The MDS consists of two different types of server. The servers GRIS and GIIS are outlined in sections 4.1 and 4.2 respectively.

4.1 GRIS

GRIS servers can be located at various points across a grid. They are designed to hold information about any machine that has been registered with them. The information in question could be either static or dynamic, and the architecture of the GRIS server is designed to be easily extendable to provide a holding space for data of any kind about individual machines. Information is uploaded to GRIS servers manually by the user unless more advanced support is built upon it. No single GRIS contains the details of every machine on a given network. This allows limited protection against failure of a given server and allows faster retrieval times with less load at a given point of a network. Because different machines are listed with different GRIS servers at could be difficult for a given user to find out about a given machine. In theory they would need to know the location of every GRIS server to poll them individually. Luckily Globus provides a second type of server to deal with this eventuality which is explained in the next section.

4.2 GIIS

All GRIS servers are registered with a separate Grid Index Information Service (GIIS) server. All GRIS servers register with this one GIIS server when they are activated. All the GIIS server has to store is the location of each GRIS making the request load considerably more manageable. The GIIS server can also be programmed to know the name of each machine registered to that GRIS. In this way a user can find out information without the hassle of contacting every GRIS server on the grid. The diagram in section 2.3 showed the relationship between GRIS and GIIS servers with the main box, marked MDS representing the main aggregate directory GIIS and the boxes leading off from it representing the GRIS servers. Note that the diagram showed that the two machines A and B were linked off one of the GRIS server boxes. This indicates that information pertaining to those machines could be found on that particular GRIS. GIIS servers can be programmed to store only the details of GRIS locations, or can be programmed to hold any piece of data already held by the individual GRIS servers.

GIIS servers obviously could represent a centralized point of failure within a grid environment. Therefore in order to help alleviate this problem a number of shadow GIIS servers can be set up, which can take over if for any reason the main GIIS is inaccessible.


5 HBM The Heart Beat Monitor (HBM) is designed to provide simple fault monitoring for remote machines. Reporting faults is a difficult task as failures can occur for many different reasons. As with all grid services the HBM has to be set running explicitly by the user. It is primitive in nature, consisting of a very simple mechanism of polling to detect failure. Of course this could be very process intensive for both machines depending on the time between polls. The HBM monitors processes on remote machines but is also used to show network failure. In this case the monitor registers the lack of a response rather than a report of actual failure. So in effect if for example a network breakdown stopped a remote machine from sending a signal back, it would be assumed that the machine in question had gone down. As with most parts of the Globus Toolkit the HBM requires substantial user intervention to set running, and in most cases this would stop anyone from manually using it. At present the future of the HBM is in doubt as it has been deprecated as a service. HBM has not been removed mainly due to the customer base already using it. As development has now almost completely ceased on this part of the system, its use in future releases is not guaranteed. The HBM itself contains three main components which are described below.

Heart Beat Monitor Client Library (HBM-CL)

The main function of the HBM-CL is to provide a way to register processes for monitoring. The request generated by the HBM-CL’s globus_hbm_client_register() program is passed onto the HBM-LM which is described in the paragraph below.

Heart Beat Monitor Local Monitor (HBM-LM)

The HBM-LM is run on any machine that monitors processes. Its job is to accept requests for jobs to be monitored from the HBM-CL, and to monitor them based on a simple timer mechanism. It then reports back all pertinent information gained to the HBM-DC which is described below. For example if it received a HBM-CL request for the stoppage of monitoring of a specific job it would report this fact back to the HBM-DC the next time it transmitted.

Heart Beat Monitor Data Collector (HBM-DC)

The HBM-DC is a centrally located server responsible for collecting information from individual HBM-LM’s around a network and to provide information at request on the status of those jobs. It is ultimately responsible for monitoring the frequency of replies from various remote HBM-LM’s. If for any example one stopped reporting the machine would be assumed inaccessible.


6 GSI The Globus toolkit contains a sophisticated security architecture designed to make the software as secure as possible. The following sections discuss the methods Globus employs to do this.

6.1 Grid Security Overview

Throughout the development of computing many problems have emerged in the seemingly simple task of making a system secure. Despite progress on this over the last thirty years there is not, and never likely to be a perfect security system. The arrival of Grid based systems opens up new problems because of its extensive use of networks. The physical networks themselves are normally out of the hands of the people using them and as such cannot be secured by user actions. Of course the level of risk entailed by this depends on the size of the network itself and the area it encompasses. For example this would be less of a problem in an office building on a military base, than it would be for a firm encompassing two sites on opposite ends of the country making use of civilian fiber optic networks. Therefore in order to provide any degree of security the information travelling along the network has to be secured itself in some way. Unfortunately at this point confusion can occur, some companies simply ignore the possibility or consider it to be too costly to fix. Others consider the situation and say the answer is encryption.

Unfortunately encryption is a word used somewhat liberally with little thought for how it should be implemented. For example how do you stop a person from masquerading as someone else and making use of grid resources? How do you make sure that a person using the system legally is not accessing or doing things they should not. These could be more broadly termed insider jobs. The problems do not even end there, and many books have been written solely on the subject of security[6,7]. As such it should be clear that the scope and problems associated with grid security are significant, and need to be addressed if this technology is ever to be used extensively. The following sections outline some of the technologies that have been used to make the Globus system more secure. It is important to note however, that the subject of security is forever ongoing and the solutions outlined below do not represent a secure system for grid technologies in the future, due to the many unresolved issues in the area.


6.2 The Globus Grid-map file One of the most important considerations on the security model of Globus is that only users who are authorized to use a machine can do so. The first line of defense therefore that Globus wields is that of the grid-map file. This file, created by the owner of each grid machine specifies which computers they allow requests to come from. Unless an entry is listed they would not be allowed access. An example entry on a grid-map file is shown below with an explanation of each section.

"/O=Grid/O=Globus/OU=lancs.ac.uk/CN=John Smith " jons

"/O=Grid/ A standard introductory part specifying that it represents a grid. Entries of this type are used by many types of software therefore this is a necessary part.

/O=Globus/ This specifies that not only is it a grid, it run Globus software.

OU=lancs.ac.uk/ The domain name under which the computer operates.

/CN=John Smith The name of the person who is authorized to make requests

jons The local persons user name. This is encoded into the certificate to stop people using it without being logged in as the right person.

This first part may seem surplus to requirements but the entry itself is derived from and tested against a part of the system which is covered in section 3.4

This therefore allows a user to be validated, ie unless they gave the correct information they would not be permitted to request resources. In a perfect world only this would be required, however the information itself is secured only by the local machines own security precautions. So the information of what users are allowed in is only secure if nobody ever finds out what it is. Clearly this is inadequate given what is at stake. At a more basic level you also have the problem of someone lying as to who they claim to be. Therefore it is essential that a person be able to prove who they are to other users. There are many ways that this could be accomplished, but all have draw backs of some nature. For example you could use personal key cards or photographic identification. But both of these would suffer from the amount of hardware required to make them work and are by no means perfect. The method that the Globus platform makes use of is encryption, which is seen as a common way in which to secure data, but also to prove identities with a method called mutual authentication (see section 3.5).


6.3 Encryption The Globus system makes use of encryption to ensure authentication of users. The sections below outline, first the basic principles of encryption, and then how Globus makes use of it within its security architecture. For those readers who know the basics of simple encryption section 6.3.1 can be skipped.

6.3.1 The Basic Principle of Encryption

Encryption is used extensively in the Globus system to authenticate users and requests. Encryption is the taking of some piece of information, for example your medical files and applying some form of cipher to them so that they are no longer readable. They can then be sent to their destination in the knowledge that anybody who intercepts them would not be able to read them. They can then be decrypted with the cipher at the other end. The cipher is in fact a mathematical algorithm designed purely to encrypt and decrypt messages.

As designing ciphers is no easy business the same cipher may be used by many people for encrypting just about anything. To make the cipher work you must enter a key, (the normal level of protection currently being a key 1024 bits long). That same key is then entered into the cipher at the other end to decrypt the message. This means that though many people use the same cipher, they cannot all decrypt each others messages unless they hold the key it was encrypted with.

The basic premise outlined above is that of private key encryption where the same key is used to encrypt and decrypt information. There are many varieties of encryption but all basically boil down to the above explanation at some level or another. The thing that most people do not seem to grasp is that encryption is not absolute. Depending on the length of the keys used a system could be more or less secure. Another consideration is the quality of the encryption algorithms themselves, which all have a relative strength based on how flawed they are (there is no such thing as a perfect algorithm). A diagram explaining the basic principal is shown in Fig 3.

Plain Text………………………………

Algorithm

Key

Encrypted Text

………………………………

Fig 3


6.3.2 Globus and Encryption

The Globus system makes use of encryption in order to validate users to each other when making requests. The actual information that is passed between machines after this point however (the raw data the computers are working on) is not encrypted in any way. This means that although you can validate who you are dealing with, you cannot stop the work you are sending from being intercepted. This is based on the theory that someone intercepting traffic could not gain any meaningful content from it. Whether this is true remains to be seen. The reason this compromise has been made is due to the time that it takes to encrypt large files. Considering the applications that make use of grids data sets could be gigabytes in size, the overhead on encrypting every piece of information would to a large extent negate the advantage in utilizing the machines in the first place. To show how fast technology is moving the Cern Large Hadron Collider will generate sets of Petabytes in size, which would makes the problem just plain insurmountable. Globus therefore concentrates on making sure that the person making a request is the person that they say they are, and that they are authorized to do so. Improvements in encryption speeds may one day alleviate this security loop hole, but until that time most grids will probably not be totally encrypted for logistical reasons.

6.3.3 Public key encryption

Globus makes use of public key encryption. This is slightly more complex than the private key encryption explained above but the principle is the same. Public key encryption removes the need to distribute the same key to two people in order for them to encrypt data. Obviously distributing two keys is tedious and potentially opens security holes. The main reason for this is the increasing difficulty faced when more and more people know the same secret. Public key encryption uses asymmetric keys. These are slightly different from the types of key that were discussed above. They are based on algorithms which only work one way with a key, ie that need a different key to decrypt data. These two keys are termed public and private keys. Therefore a public key can be distributed without fear, so that anybody can send a message to someone encrypted, but only the intended person can decrypt messages sent to them using their private key. This can in fact work in reverse, meaning that something encrypted in a private key can be decrypted with that persons public key by someone else, thus proving they were the person to send the message. (Nobody else has their private key, and only that key could have been used to encrypt the message).

It is important to bear in mind that the way in which this works in reality is a little more complex than has been made out here. However it should give the gist of what is meant by public key encryption.


6.4 X.509 Certificates & CA’s

The way in which Globus makes use of public key encryption is by the use of certificates. These certificates were not invented for the use of Globus; they were originally designed by the ITU[8] (International Telecommunications Union). They are widely used in the internet at large and go some way towards providing secure authentications. In order to understand how Globus makes use of these certificates it is necessary to explain the role of the CA (Certification Authority).

Certification authorities are used to try to mitigate the problems from people lying as to who they claim to be. This is a major problem, as could be seen if only the grid-map file were used to secure the system. By finding out the contents of that file they could easily impersonate the people listed in the file.

Therefore a body that could vouch for the person in question would be advantageous. This third party is called a CA. At this point something mentioned earlier can be made clearer, recall that an entry in a grid-map file has the structure seen below."/O=Grid/O=Globus/OU=lancs.ac.uk/CN=John Smith " jons

The fields in this entry are set out in this way because they are a direct mapping of the information stored in a X.509 certificate. Therefore an X.509 certificate contains (amongst other things), the name, username, domain and organization of the person using the certificate. Obviously anybody could make one of these up, therefore in order for it to be valid you have to send a request to a CA submitting your details in much the same way as is listed in the grid-map file. These details can then be checked and the returning certificate signed by the CA

Very careful consideration has to be given to who you would trust to be a CA. If your computer knows to trust a certain CA it will trust all the certificates that are issued from it. Therefore it is vital that any CA you trust has checked sufficiently that the person is who they say they are. If for example a company had 50 machines running Globus it would probably set up its own CA for security reasons. Another option could be to use a commercial CA which originates outside your company. For many Globus, however, is just an experimental system that they will be evaluating in some way. To aid in this the Globus team have set up a simple test CA which you can get certificates from. It is important to note that the only thing that this test certificate authority checks is the domain from which the request was sent being equal to the domain listed in the request. Therefore beyond this as long as the request is correctly formatted it will be certified. Therefore in order to do serious work you would need to set up your own CA. One of the main drawbacks of this system is the setting up of a new CA which is not a simple process. As most people testing the system only require rudimentary security at this level the test CA suffices in most cases.

Whilst the system creates a request for a certificate it also creates a private 1024 bit key for your future use when dealing with authorizations. The certificate you receive back could be one of two types (depending on what you ask the CA for). It could be a host certificate enabling a computer to be used by others, or a user certificate, which enables you to send jobs to others. Therefore to do both you would need both certificates. The main difference between the two certificates is the fact that in order to get a host


certificate amongst the other details sent to the CA you also need to send your machines full name. For example js.lancs.ac.uk. This ensures that the more security conscious role of executing jobs is tied down to definite machines to help make security more complete. Whichever certificates you have they contain your public key which can be used by other people to communicate with you and to provide mutual authentication.

It is very important to realize that the private keys created all depend on the level of local security on the machine to keep them secure. Therefore careful consideration has to be given in order to make the system as secure as possible.

6.5 Mutual Authentication

In order to send jobs between computers it is essential that both are mutually authenticated so that both know who they are dealing with. Globus completes mutual authentication for every job request it receives. The way in which they authenticate is listed as a series of points below.

1) Machine A sends its certificate to Machine B

2) Machine B responds by sending its certificate to Machine A

At this point both know who they are supposedly talking to and both certificates are examined to make sure that the CA that signed them can be trusted. Unfortunately so far it has only been proven that those people had the certificates and that the certificates are valid. There are no guarantees yet that the people sending those certificates are not bogus.

3) Machine A creates a message for B encrypted in A’s own private key asking some question. For example: add 50 and 30.

4) Machine B decrypts the message from A using A’s public key.

A CertMachine A Machine B

B Cert

Fig 4

B decrypts using Public(A)

Private(A) + QuestionMachine A Machine B

Fig 5


Machine B now knows that Machine A is telling the truth about its identity. The reason for this is simple; that the message had to have been created using A’s private key, known only to A. Unfortunately machine A has no such guarantees about machine B.

5) Machine B completes the question and encrypts the answer using its own private key and sends to A.

6) Machine A decrypts the message using B’s public key and examines the answer.

Machine A now knows that machine B is genuine because the message had to have been created using B’s private key. Thus both machines have authenticated each other and can start sending jobs between each other.

An important thing to note about this form of mutual authentication is that even these security considerations can be foiled. For example if the CA were not checking details correctly or was indeed bogus. Or if the security of a given machines private key were in question. However as with all security mechanisms the question is what level of security you can afford to implement. In the case of Globus, with the other parts of the security architecture in place this level of security should be sufficient for most applications.

6.6 Other Security Considerations

Though not strictly speaking a security consideration, usability is important within a system. Any user who regularly sends off hundreds of jobs will quickly tire of having to enter their pass phrase for every single one. Therefore it was important that methods were developed to try to ease this problem. Section 6.6.1 shows one such method that if used with care can still retain a degree of security within the system. Other problems occur when attempting to send more complex jobs which require added security precautions. The section 6.6.2 outlines Globus’s response to these problems.

A decrypts using Public(B)

Private(B) + Answer

Machine A Machine B

Fig 6


6.6.1 Proxies

Depending on how you used a system it could seem tedious constantly having to retype in your password to the system in order to send a request. However it is very important that the machine you are using is not left open to misuse. One solution would be to log in to a session whereby you could send all the requests you want. There is one important drawback with this approach; that once logged in a user would probably never bother to log out. Security is breached mainly due to these sorts of events. It is not the fault of the system that the users do not use it correctly. However it is the fault of the designer if they fail to take into account the way in which people use these systems. The logging in approach is one way of implementing what is technically know as “single sign on”. For grid computing to be successful a workable approach to this problem is needed.

The solution to this problem, or at least one of them, is to use a proxy. A proxy is created in the same way that you validate yourself for a request, ie by using your pass phrase. The request creates a new proxy certificate which is then used during authentication sessions. The proxy certificate itself can be traced back to the original user who created it to help identification. The basic premise for this is that the proxy certificate contains the digital signature from the CA pointing out that the user can be trusted. The user then signs the proxy certificate to make clear that it is that person. From this point the proxy will then operate in much the same way as if you then had a logged in, but with the critical difference that the proxy is of limited lifespan. By default this is 12 hours. Therefore the user does not have to remember to logout and can leave the security to the computer. Proxy certificates rely solely on a machines security precautions, and removing the user from the final stage of verification is a security risk, but this can be mitigated at least in part by allowing the users to create proxies of different sizes. It is worth pointing out that though it is now using a proxy certificate the mutual authentication procedure is not affected at all. The process of gaining a proxy is shown in Fig 7 below.

New certificate generated generated

Passphrase verified

Request for proxyMachine Proxy Service

Proxy certificate

Fig 7


6.6.2 Specific GASS server issues

As was discussed in section 3 GASS servers play an important role in allowing access to files held at remote locations. GASS has very specific security issues and the level of access allowed by the GASS system is very much dependant on the limited security imposed.

Currently GASS servers do not allow multiple users access to the same cache. The reasons for this is obvious. Anyone who is authorized to execute on a given machine is allowed to set things up in the cache. The cache itself is by default always stored in the same location. As such it would be a security problem if that cache were not deleted each time it were used. It would also be a security risk if more than one process were allowed to run on a remote machine at the same time. Neither of these things are ever allowed to occur though. In theory you could add things permanently to a cache as was suggested in section 3. However this security limitation means that if you choose to do this that file has to be available to everyone. No active way of restricting access is available within the Globus security mechanism. So though the system is secure in this respect, it limits the way in which access can occur.

The current level of access based on caches and ftp servers does not allow free access to local machines files. Though this would obviously be a security risk this could be mitigated to a large extent by a list of users allowed access to certain directories. At present Globus does not support this though. Secure ftp servers can be utilized only if they accept Globus certificates, as the GSI mechanism will only perform mutual authentication using them; this could conceivably cause a problem in some environments.

Overall the GASS system causes substantial security problems and the level of support for file access is severely restricted because of it.

6.7 Overall Issues with Globus security

The security model of Globus has proven adequate during small scale testing. However a number of issues have arisen during development. It relies too heavily on the level of security provided by end users. Essential items like the grid-map file and trusted CA lists could be vulnerable. The manual way in which the grid-map file and others are manipulated makes system wide security management very difficult. For example if a certain person had to be removed from the grid-map file lists of every machine, this would quite literally have to be done by every user on every machine. The official documentation on the Globus website does not lay sufficient emphasis on the level of security a user would need to provide to effectively use the security model. The installation of Globus itself is disturbingly complicated and small errors could for example, completely disable parts of the security mechanisms. In its quest to allow machines to be individually set to accept job requests from specific machines; it lacks any ability for centrally allocated security provisions.

While a proxy facility may be essential it is also one of the biggest security loopholes in the system. The use of public key encryption and X.509 certificates provides a flexible if somewhat slow authentication system. It is believed that with large jobs this overhead


will cease to be an issue but that remains to be seen. Most of the security within Globus relies on mutual authentication, though once authenticated Globus does not stop the eavesdropping of raw data passing between machines. This unprotected transfer of data could also be a cause for concern in some situations, especially where the content of the data is of a sensitive nature, be that commercially or militarily. Given the increases in computer processing and the speed with which modern cryptography can be carried out, in a few years time all communications may end up with some form of encryption. Therefore work needs to be completed on this area soon. The level of support for more complex jobs which require local resources is an ongoing area of research. At present support for this is somewhat patchy and little evidence is available to show any security changes made to accommodate them. For example a useful feature such as the ability of many users to access the same cache is planned, though no work on how they are going to secure this mechanism is available.

7 Running a simple Job Now that all the major sections of the system have been covered it is possible to understand the number of steps needed to make a job run, and the meaning of those steps. Therefore in order to make exactly clear what happens when these jobs are run, and to fully understand the number of actions it takes, the following example is given.

In order to run a job

1) Both parties require Globus certificates (see section 6.4).

2) The execution host must have authorized use of their machine by the user in question by an entry in the grid-map file. (See section 6.2)

3) The user must have made a request to the execution machine, logging in using their passphrase at the time of submission, or by setting up a proxy beforehand. (see section 6.6.1)

4) The execution machine must be running a gatekeeper to receive the request

5) Mutual authentication using the Globus certificates must then take place. (see section 6.5)

6) The job can now run

If access to additional resources is needed by the program it may also be necessary to set up a GASS server (see section 3).

It is easy to see that setting up the system to do a simple job is by no means a simple feat. Assuming a user was prepared to do these steps there are also a number of optional


services they could run. A few of these are listed below with some of the major steps required to set them up.

If an information server was to be used

1) A machine for a GIIS would need to be picked and set up. Plus any number of shadow GIIS

2) At least one GRIS server would need to be set up

3) The GIIS may need to be adapted for any additional information that needs storing

4) The GRIS server would have to register with the GIIS server and any other shadow GIIS.

5) The machine running the execution gatekeeper needs to register with the GRIS after finding an appropriate one from the known GIIS host.

6) At this point a local machine could request information about the location of a remote machine.

If a HBM were to be used

1) First a HBM-LM would have to be picked and started up

2) A HBM-DC would also have to be picked and started up

3) The HBM-LM would have to register with the HBM-DC

4) The user could then register a job with its nearest HBM-LM

5) In order to find out about a given job the user would then have to request information from the HBM-DC

As can be seen from these simple examples using Globus requires a number of user actions to take place in order to accomplish any given task. This is the reason that Globus is rarely considered to work without a further level of abstraction above it. This issue iscovered in depth in the next section.


8 Extending Globus (schedulers) From what has been said over the last few sections it should be clear that although Globus can achieve a number of things, it does have its weaknesses. It requires too many user actions to perform a given task. This is the point at which other companies and products come into play. Globus provides the roots required to build computational grids but allows many different services to be built above it. For example a particularly useful feature would be to add a GUI with the ability to see other users and their resources at the click of a button. Couple this with the ability to send requests and you get a scheduler. By creating a new level above Globus many of the tasks that were difficult and tedious can be automated. One such example of a simple scheduler would be that of Condor[9]. Condor however does not utilize a GUI and offers only the most basic scheduling functions. It also needs requests submitted to it in its own language which is different from Globus’s. This stems from Condor’s history as a cluster scheduler before being extended to work with Globus. A new breed of add-ons for Globus will no doubt emerge in the next few years however, and they should address many of the issues that people have with Globus at the moment. The Appendix contains more details on the Condor scheduler.

9 Conclusions In conclusion therefore it is easy to see that Globus has tremendous potential to build useful computational grids. Globus is by no means a closed finished product however, and that must be taken into account when using it. Its flexibility is hampered especially by its security mechanisms though, which need to take into account the varied activities that people are working on using grid technology.

In the next few years different companies will use the Globus platform to build more powerful tools which will have proper support mechanisms. It is at this point that grid computing will finally take off properly. In the mean time the Globus Toolkit is available for those with development interests or simple curiosity about grids and their nature, to download and examine.


10 GlossaryCA - A Certificate Authority. The trusted party required to vouch for

X.509 certificates.

Certificate - A file containing a users public key and their details.

Cipher - A mathematical algorithm designed to encrypt data given a key.

FTP server - server designed to allow access to files hosted on it.

HBM - Heart Beat Monitor. Polls machines for failure

GASS - Globus Access to Secondary Storage.

Gatekeeper - A service running on a receiving machine capable of handling requests.

GIIS - Grid Index Information Service. The aggregated directory holding details of GRIS servers

Globus - A middleware grid solution designed at Argonne National Laboratories and Chicago University.

grid-map file - A file containing a list of users permitted to use that machine once authentication has taken place.

GRIS - Globus Resource Information Service. Holds details of machines.

GSI - Globus Security Infrastructure.

Key - A unique string of bits (typically 1024) used in conjunction with the cipher to encrypt data.

MDS - Meta Data Service. This is used to keep track of details about machines.

Middleware - Software designed to work between client and server levels to provide additional functionality.

Passphrase - A simple password used to validate a user to create a proxy or send a request.

private key - A key held only by a single individual used to encrypt messages to prove who they are and to decrypt messages encrypted using its public key.

Proxy - A method of automatically authorizing the sending of requests without having to re-enter the passphrase every time.

public key - A key held by potentially many people used to encrypt private messages to a person for decryption using their private key.

RSL - Resource Specification language. Globus uses this to formulate job requests


11 References

1) Globus - http://www.globus.org

2) seti@home - http://setiathome.ssl.berkeley.edu/

3) NASA IPG - http ://www.nas.nasa.gov/About/IPG/ipg.html

4) Cern - http://wwwlhc.cern.ch/

5) Sun - http://www.sun.com

6) Bruce S. Secrets and Lies : Digital Security in a Networked World, John Wiley & Sons; ISBN: 0471253111

7) Bruce S. Applied Cryptography 2nd Edition, John Wiley & Sons; ISBN: 0471117099

8) ITU - http://www.itu.int/home/index.html

9) Condor - http://www.cs.wisc.edu/condor/

http://www.cs.wisc.edu/condor/

http://www.itu.int/home/index.html

http://www.sun.com/

http://wwwlhc.cern.ch/

http://www.nas.nasa.gov/About/IPG/ipg.html

http://setiathome.ssl.berkeley.edu/

http://www.globus.org/


12 Appendix

Condor-g A brief Overview

What is Condor-g? Condor-g is a Task Scheduler designed to work with the Globus Middleware platform. It was designed to act as a first generation brokering system for grid computing. As such it works at a very low level, in much the same way that Globus does. Condor-g is also designed to extend the functionality of Globus to integrate DAG scheduling and a better grasp of which machines are running at a given time. One of the problems with Globus is that jobs are submitted and then run immediately; condor gives greater flexibility by running programs when it can and keeping track of the state of execution on the remote machines.

How does Globus fit in with this? The purpose of Globus is to provide the basic software to enable inter-domain communications, security and job execution etc. The Condor-g broker then provides the added functionality for scheduling jobs on machines. In this way a complete system can be offered which allows the submission and security of jobs to maximize the level of feedback to the user.

So how does it all fit together? Condor-g interfaces with the basic Globus services, predominantly with the Globus gatekeepers on the remote machines. It also tries to ensure that the grid proxies, which the Globus system makes extensive use of, are going to be valid for the duration of the job. The Condor-g broker is represented by a “personal Condor” the job of which is to handle the scheduling and to interface with the Globus gatekeeper. Jobs themselves are submitted via a submission file that has to be written before the submission takes place. Principally this lists the machine that the program is to be run on and the details of where the program resides. There are also options to interpret the file as a DAG (Direct Acyclic Graph) request. There are of course other options but the main ones have been listed above.

Problems with the Condor-g System Condor-g is a research program which is solely supported and run from the University of Wisconsin USA. The code for which is not open source and only represents a side interest for the university. The development of Condor-g has therefore been a slow one, and many important issues need to be dealt with. The most important of these would appear to be that of having to write a submission file in order to make a request of the Condor-g system. The submission language is different from that of Globus requests due mainly to its previous history as a cluster scheduler before being extended for use with Globus. The programs for finding out information are also all textual command line interfaces. As Condor-g is designed to extend the functionality of Globus it would be


prudent to do something about its lack of usability and its user friendlessness. This in itself is not a big enough problem to stop people using the program but it would appear there are other concerns. Though the scheduler has been used in the field for a few years now, so far the testing has been on a small scale. With products such as the Sun Grid Engine also trying to gain acceptance in the marketplace it is unlikely that Condor-g at least in its present form will survive as a viable brokering client. Other issues which Condor may have to address include that fact that it will not arbitrarily allocate machines to resources. This surely represents one of the most important issues concerning a scheduler. It would appear that although the system will provide users with more information, it also expects them to make the decisions as to what to do with it. Though Condor-g supports DAG, it does so in a very low level way. A different submission file has to be written for each node making the allocation of this type of request to Condor-g a very tedious one. The question of how well Condor-g copes with heavy workloads is an unknown. However it is a fact that if a mistake is made; For example submitting a machine to somewhere incorrectly, stopping the scheduler from resubmitting the request when it sees fit, or canceling the job entirely is needlessly complicated. For example even when jobs are marked for deletion they do not disappear from the scheduler interface. The overhead when using Globus makes small executables pointless to send. It is presumed that given much larger and longer programs this overhead will cease to become a real problem, though it should not be forgotten. The overhead for a typical program of a less than a second of running time could take 5-6 sec to run even on the local machine. On a remote machine this is slightly though not by a great deal longer.

In Conclusion Condor-g is a useful add-on to the Globus platform. It provides basic scheduling functionality to extend upon Globus’s basic grid functionality. However it is only useful at a very low level and could become overshadowed by new schedulers rapidly given the current Grid climate.

Date post:	27-Apr-2017
Category:	Documents
Upload:	shruti-bansal
View:	216 times
Download:	0 times

An introduction to the Globus toolkit.doc

Documents