+ All Categories
Home > Documents > [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture...

[IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture...

Date post: 06-Jan-2017
Category:
Upload: duongthuy
View: 213 times
Download: 0 times
Share this document with a friend
9
Logical Timing (Global Synchronization of Asynchronous Arrays) Victor Varshavsky Vyacheslav Marakhovsky Tam-Anh Chu The University of Aizu The University of Aizu Cirrus Logic Inc. Japan Japan USA Abstract The problem of global synchronization is solved for asynchronous processor arrays and multiprocessor sys- tems with an arbitrary interconnection graph. Global synchronization of asynchronous systems is treated as a homomorphic mapping of an asynchronous system behavior in logical time onto the behavior of the cor- responding synchronous system with a common clock functioning in physical time. The solution is based on decomposing the system to the processor stratum and synchro-stratum; the latter plays the role of a global asynchronous clock. For the case of a synchronous system with two-phase master-slave synchronization, a simple implementation of the synchro-stratum for the corresponding asynchronous system is proposed. It is shown that, depending on the local behavior of the processors, the synchro-stratum is able to perform two types of global synchronization: parallel synchro- nization and synchronization that uses a system of synchro-waves. “Tasks are divided to not solved and trivial ones.” Mathematical folklore. 1 Introduction The relationship between the synchronous and asynchronous in the computing system organization has been discussed continually for many years. Even though the proponents of the asynchronous keep find- ing new advantages of it, the synchronous method- ology prevails in the modern computing system ar- chitecture. On the other hand, when discussing the prospects of massively-parallel and multiprocessor computing systems, most authors place their aspira- tions in the asynchronous approach. It seems, how- ever, that experts in different fields treat the con- cepts of synchrony and asynchrony somewhat differ- ently. Consequently, in order to be able to discuss those issues we have to define the basic terms of the subject matter: parallelism, concurrency, synchrony, asynchrony and finally, time. To do so is extraordi- narily difficult without being drawn into an endless debate on the primary meaning of well-used terms. Anticipating adverse criticism, we shall nevertheless have to provide some definitions, if only to serve the purposes of this article. The concept of time is one of the most complicated and sophisticated in modern science. There are two basic definitions of time: physical time defined as an independent physical variable and logical time defined as the partial order on some events that results from their cause-and-effect rela- tionships. All the definitions of physical time use the concepts of “simultaneity” or “the same calibrating function of the clocks”. Obviously,either can be attained only with a certain degree of accuracy. These concepts, as well as the concept of physical time itself, can be effec- tive only if the measuring capacity of our systems is sufficient to ensure the required accuracy of the mea- surement. The concept of logical time formulated in ancient Greece by Aristotle (“If nothing happens, no time”) can be usefully exploited in systems design only if we deal with events that have overt cause-and-effect re- lationships. Meta-stability is a typical example of the problems that arise when trying to coordinate events without explicit cause-and-effect links. According to the definition, “Synchronization is the process of maintaining one operation in step with an- other” (McGraw-Hill Encyclopedia of Science & Tech- nology, 7th Ed., 1992, p.662). This definition has ap- parently nothing to do with the concept of time so the concept ‘Lsynchronization” can be applied to physical 130 0-8186-7038-X/95 $4.00 0 1995 IEEE
Transcript
Page 1: [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis - Fukushima, Japan (15-17 March 1995)] Proceedings the First Aizu International

Logical Timing (Global Synchronization of Asynchronous Arrays)

Victor Varshavsky Vyacheslav Marakhovsky Tam-Anh Chu

The University of Aizu The University of Aizu Cirrus Logic Inc. Japan Japan USA

Abstract

The problem of global synchronization is solved for asynchronous processor arrays and multiprocessor sys- tems with an arbitrary interconnection graph. Global synchronization of asynchronous systems i s treated as a homomorphic mapping of an asynchronous system behavior in logical time onto the behavior of the cor- responding synchronous system with a common clock functioning in physical time. The solution is based on decomposing the system to the processor stratum and synchro-stratum; the latter plays the role of a global asynchronous clock. For the case of a synchronous system with two-phase master-slave synchronization, a simple implementation of the synchro-stratum for the corresponding asynchronous system is proposed. I t is shown that, depending on the local behavior of the processors, the synchro-stratum is able to perform two types of global synchronization: parallel synchro- nization and synchronization that uses a system of synchro-waves.

“Tasks are divided to not solved and trivial ones.” Mathematical folklore.

1 Introduction

The relationship between the synchronous and asynchronous in the computing system organization has been discussed continually for many years. Even though the proponents of the asynchronous keep find- ing new advantages of it, the synchronous method- ology prevails in the modern computing system ar- chitecture. On the other hand, when discussing the prospects of massively-parallel and multiprocessor computing systems, most authors place their aspira- tions in the asynchronous approach. It seems, how- ever, that experts in different fields treat the con-

cepts of synchrony and asynchrony somewhat differ- ently. Consequently, in order t o be able to discuss those issues we have to define the basic terms of the subject matter: parallelism, concurrency, synchrony, asynchrony and finally, time. To do so is extraordi- narily difficult without being drawn into an endless debate on the primary meaning of well-used terms. Anticipating adverse criticism, we shall nevertheless have to provide some definitions, if only to serve the purposes of this article.

The concept of time is one of the most complicated and sophisticated in modern science. There are two basic definitions of time:

physical time defined as an independent physical variable and

logical time defined as the partial order on some events that results from their cause-and-effect rela- tionships.

All the definitions of physical time use the concepts of “simultaneity” or “the same calibrating function of the clocks”. Obviously,either can be attained only with a certain degree of accuracy. These concepts, as well as the concept of physical time itself, can be effec- tive only if the measuring capacity of our systems is sufficient to ensure the required accuracy of the mea- surement.

The concept of logical time formulated in ancient Greece by Aristotle (“If nothing happens, no time”) can be usefully exploited in systems design only if we deal with events that have overt cause-and-effect re- lationships. Meta-stability is a typical example of the problems that arise when trying to coordinate events without explicit cause-and-effect links.

According to the definition, “Synchronization is the process of maintaining one operation in step with an- other” (McGraw-Hill Encyclopedia of Science & Tech- nology, 7th Ed., 1992, p.662). This definition has ap- parently nothing to do with the concept of time so the concept ‘Lsynchronization” can be applied to physical

130 0-8186-7038-X/95 $4.00 0 1995 IEEE

Page 2: [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis - Fukushima, Japan (15-17 March 1995)] Proceedings the First Aizu International

and logical time alike. However, when using the term “synchrony”, we usually read into it the sense of si- multaneity with all the restrictions on this term in the concept of physical time. On the other hand, we all realize that there is more to the concept “asynchrony” than just “non-simultaneity.”

Finite automata theory was apparently the first to encounter these difficulties. Most textbooks do not of- fer the precise definition of an asynchronous automa- ton. The attempts to give such a definition do not go much further than stating the fact that a transition in an asynchronous automaton is initiated by an input symbol at an arbitrary moment of time (which time?) and any transition terminates in a steady state. In the abstract theory of finite automata, an automaton is a model of an algorithm with finite memory in which time is discrete and is marked by the ordinal num- bers of symbols in the input and output sequences. In this sense, the abstract model of a finite automaton corresponds to the model of an asynchronous automa- ton that evolves in logical time; the conversion to the synchronous model is an arbitrary engineering solu- tion. It is important to decompose (i.e. factorize) the input alphabet code, introducing a special synchroniz- ing variable. Such decomposition allows us to signif- icantly simplify the synchronous implementation, but presents a number of extra problems.

It is exactly in a synchronous implementation that we encounter real physical asynchrony, if that were to mean unpredictable variation of the delays intro- duced by real physical components and which are mea- sured, with the permissible accuracy, in physical time units. The chosen duration of the synchronization step must exceed (i.e. mask) possible delay variations which solves the synchronization problem at the level of hardware components (gates, wires, etc.). The “re- sponsibility” for maintaining the step value and en- suring the correctness of synchronization rests with the environment that contains the clock. A so-called “asynchronous implementation” in its classical sense does not affect much the interaction scheme “environ- ment - automaton”, while causing extra difficulties associated with hazard-free and race-free coding. The difficulties can, to a certain extent, be eliminated by a matched (Le., handshaked or self-timed) implementa- tion in which the automaton sets the rhythm for the environment and the environment - for the automa- ton.

The understanding of the term “asynchrony” by ex- perts in algorithms and architectures is not always and not completely the same as the understanding by ex- perts in low-level hardware. An algorithm (and a pro-

gram that represents it) consists of a sequence of steps which perform some actions (commands). Asynchrony is usually treated as the dependence of the number of steps required to obtain the result on the input data. In the case of a fully sequential algorithm (program), such treatment of asynchrony is important only for performance evaluation. Parallel algorithms and pro- grams present new and challenging tasks. Steps of an algorithm are performed (or can be performed) con- currently. Many authors assume that the semantics of the word “concurrency” is clear at least to native English speakers, and they usually offer little more than examples from daily life (such as brick-works, services in a bank, etc.). In fact, representing an al- gorithm (program) in the form suitable for concurrent implementation boils down to the exposition of the cause-and-effect relationships between the operations (processes, commands) in the algorithm and deriving the relation of partial order. Hence, the derivation of a parallel specification is a procedure of introducing the logical time into the algorithm.

If the length of logical time is measured in gen- eral physical time units (ticks) generated by the com- mon clock, using the operation wait and the well- known synchronization primitives (synchro-primitives for short), those, along with the common external clock, provide simultaneity and solve the problem of timed behavior in principle, irrespective of local data- dependent asynchrony. We will call this approach “synchronous concurrency.” The major difficulties in implementing this approach are delivering the signal from the common clock to all the processors (pro- cesses) of the system and hardware implementation of indivisible synchro-primitive operations. These diE- culties increase significantly as the complexity of VLSI and VLSI-based systems grows.

Giving up a common clock, we arrive at systems that are asynchronous in physical time. The method- ology of self-timing is naturally the basis of all such im- plementations. However, self-timing has a number of disadvantages, too. First, for those implementations complexity is higher than that of the conventional syn- chronous ones. Second, the performance may not be very good having to provide local handshake at all structural levels and to gather synchronization signals at the bit level. And finally, when several processes have a shared resource, non-synchronized arbitration may occur. The first two disadvantages are the price paid for giving up a common clock. The last one can- not be alleviated within pure self-timing.

The basic idea of synchronous design for a master- slave implementation that eliminates the problems of

13 1

Page 3: [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis - Fukushima, Japan (15-17 March 1995)] Proceedings the First Aizu International

physical asynchrony is as follows: every module can be in two phases: active and passive; the current phase is determined by the value of the clock signal; in the passive phase, all the signals inside and at the outputs of the module keep their values; in the active phase, the module changes its state depending on the edema1 signals; it is assumed that the soupces of the external signals are in the passive state. The major problem, as we mentioned above, is the clock cycle duration that guarantees the completion of the transition processes in every module.

Let us consider a one-dimensional processor ar- ray. Its synchronous implementation is presented in Fig.l,a. In such an array, information and/or instruc-

Figure 1: One-dimensional processor array: a) synchronous two-phase implementation, b) asynchronous pipeline implementation, c) asynchronous implementation with an allot- ted synchro-stratum.

tion flows can propagate in opposite directions and the processes can be initiated at both ends (ports) of the array without causing any undesirable collisions. In other words, one can, quite easily, implement wave- propagation algorithms, which assume that the wave fronts propagate in opposite directions, being initiated independently at opposite ports. Clock signals 2’1 and Tz control the flow of information and/or instructions through the array and provide correct master-slave in- teraction between adjacent processors.

In Fig.l.b, the structure of a one-dimensional self- timed pipeline is presented. In the simplest one- dimensional pipeline, which of the ports is the ini- tiator depends on the point of view and initial state.

The structure can be complicated to provide two- dimensional flow, such as, for example, the one in Counterflow Pipeline Processor Architecture [l] where two opposite-directed pipelines are used. However, a direct attempt to organize lateral interaction be- tween the stages of the pipelines causes arbitration. Since we are interested in more complicated proces- sor arrays, in which the interaction of opposite waves may be an important feature of the algorithm or part of the “breathing room capture” procedure by initia- tors placed in different ports of the array, the ques- tion of avoiding arbitration arises. Another important question is whether the algorithms developed for syn- chronous arrays can be used without change in arrays with local asynchronous interaction.

C.D.Nielsen answered positively to the former ques- tion. He found that arbiter-free interaction between opposite pipelines can be organized. Examples of delay-insensitive designs with constant response time were presented [2]. In this paper, we will try to suggest a general solution for the problem of syn- chronizing asynchronous arrays, which decomposes the original array and extracts a special synchro- structure (synchro-stratum) from it as it is usually done when designing synchronous circuits. In a sense, it is like an asynchronous clock providing synchroniza- tion in logical time. An example of the structure of a one-dimensional processor array with an extracted synchro-stratum is presented in Fig.l,c. Any magic trick, however, is based on an illusion. Decomposing an array into a synchronization subsystem (synchro- stratum) and processor subsystem solves no problems by itself. It only significantly simplifies the general s e lution and allows one to single out the synchronization problem while disregarding processor behavior.

2 Two-phase master-slave synchro- nizat ion

Let us return to Fig.l,a. A processor array is to perform some algorithm. The algorithm is defined as a sequence of steps, i.e. in logical time. At every moment t in this time, all the processors of the ar- ray should perform step number t of the algorithm. When the algorithm is being performed, the internal state S;( t ) of every i-th processor and the states of its right &(t) and left Yi( t ) outputs at the moment t are determined by its internal state at the previous moment t - 1 and by the information received at the moment t from its left and right neighbors, i.e. the be- havior of the i-th processor is described by a system

132

Page 4: [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis - Fukushima, Japan (15-17 March 1995)] Proceedings the First Aizu International

of automaton equations:

si(t) = Fi{xi-l(t), si(t - l),Yi+l(t)}, x i ( t ) = fil{xi-l(t),si(t - l),K+l(t)}, K(t) = fia{Xi-l(t), si(t - l ) ,K+l ( t ) } -

(1)

We are now interested neither in the structure nor in the contents of this equation. The only important thing is that, if some algorithm is put into the ar- ray, the equation system ( 1 ) is fully defined. Since we are interested only in time invariants of the behavior, then at the model level it is sufficient to consider the problem of synchronizing a cellular automata array .

Synchronous implementation of the processor ar- ray requires further refinement of the synchronization system that controls the processors’ operations. The structure in Fig.l,a assumes that the processors in- teract as master and slave. Two sequences of clock signals TI and T2 are used. To make it more definite, let the set of clock signals TIT2 change as follows:

{O,O} * { 1 , 0 } * {O,O} * { 0 , 1 } * {O,O} * ... ( 2 )

When TI = 0 (2’2 = 0 ) , the even (odd) processors transmit information to their neighbors; when TI = 1 (T2 = l ) , the even (odd) processors receive informa- tion from their neighbors.

As we have mentioned above, with the adopted syn- chronization discipline ( 2 ) there is no arbitration in a synchronous array. The lack of arbitration is due to the following: a t the moment when automaton Pi re- ceives information from its neighbors and changes its state, its neighbors do not react to the change of its output signals and do not change their own output signals. In fact, this is the only condition to be satis- fied designing an asynchronous synchronizer (synchro- stratum) .

Note: in the synchronization structure mentioned above, every step k in logical time consists of two con- secutive steps in physical time. This kind of structur- ing of logical time may lead to a change of system (1) that describes the behavior of every automaton.

Fig.2 presents the signal graph of parallel two-phase synchronization in physical time for an 8-automaton one-dimensional array with the synchronization disci- pline ( 2 ) . In this graph, signals fT1 and fT2 r e p resent the changes of clock signals T1 and T2; events +Ai and -Ai have the meaning of transition processes of limited duration in automata Pi. Synchronization at the graph nodes is provided by extra time. From the graph one can see that, in step k of the so struc- tured logical time, the behavior of the even automata is described by an equation system that coincides with

I L , Figure 2: Signal graph of parallel two-phase syn- chronization of an 8-automata one-dimensional array.

system ( 1 ) :

For the odd automata, the equations look as follows:

The transition from system ( 1 ) to system ( 3 ) and (4) is purely formal and can be performed for any sys- tem ( 1 ) .

3 Globally distributed synchronization

Let us now consider the problem of designing a distributed synchronizer for arrays of matched asyn- chronous automata.

In an asynchronous array, the durations of phase transitions (transition processes) in the automata are undefined in physical time. Hence, their interaction with the synchro-stratum should be organized as a handshake. It does not matter what principles form

133

Page 5: [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis - Fukushima, Japan (15-17 March 1995)] Proceedings the First Aizu International

the basis of the automaton implementation - self- timing, start-stop local clock, incorporated delays, etc. Only the matching is important, i.e. the presence of signals A; that acknowledge the completion of transi- tion processes in response to the the clock signal.

In (31, we suggested a solution to this problem which introduced a system of logical time synchroniza- tion waves (or synchrewaves for short) that propagate through the synchro-stratum, always in one direction only. Fig.3 presents a fragment of the signal graph

J I

J I

J J +A1 (k) +A3@-l) +A5(k-2) +A7(k-3)

I I

JI JI +Al(k+l) +A3(k) +Ag(k-l) +A7(k-2)

Figure 3: A fragment of the unfolding of the signal graph describing wave logical synchro- nization of a one-dimensional array.

unfolding for such a structure (see Fig.l,c). In this graph, f T i j ( k ) denotes an event that consists in the k-th transition of the clock signal T; , i E {1,2} (logical time step k) that comes to the j - th automaton of the array. The equation system (1) becomes as follows:

Si (k ) = Fi{Xi-I(k),Si(k - 1),Yi+l(k - I)},

Xi(k) = fiI{xi-l(k),Si(k - 1),Y;+l(k - I)}, Yi(k) = fia{Xi-l(k),Si(k - 1),Yi+l(k - 1))-

(5)

Incidentally, it is easy to see from the graph in Fig.3

that the same logical time in the array exists in differ- ent automata at different physical times, and at the same physical time there are different logical times.

Fig.3 depicts only general relationship between the signals; it is an initial specification for synchro- stratum design. The implementation requires extra variables and extra design. One of the possible solu- tions [3] is as follows.

Let a; be an event that consists in a full cycle of the j - th automaton, i.e. ai = +Tij + +Aj + -Tij * -A, . Then, proceeding from the graph in Fig.3, the necessary event coordination is described by a labeled Petri net (see Fig.4). It is easy to see that this is

a1 a2 a3 a4 a6

Figure 4: Labeled Petri net describing pipeline interaction of the automata in a one- dimensional array.

a specification of a simple pipeline. A direct transi- tion from the specification to the implementation us- ing simple distributor cells [4-61 provides a sufficiently simple synchro-stratum circuit, see Fig.5. This tran-

A T

Figure 5: Synchro-stratum on distributor cells.

sition is enabled by the fact that, unlike the situation in a C-element pipeline, in a distributor cell pipeline the output signal of every j - th cell fully simulates the sequence (event) a, and after that the next, (j + 1)-th cell begins to change its output signal. However, this solution has a drawback low speed of operation, since the pipeline in Fig.5 attains its maximum speed when filled by 1/3 and adding buffer cells to the synchro- stratum is fraught with extra circuit control problems.

The signal graph unfolding in Fig.3 allows one to derive a Petri net of another kind, which is presented in Fig.6. From this net and using direct translation [4,5], a synchro-stratum can be built which uses three distributor cells per two automata (correct operation of a circuit based on distributors requires that every cycle has no less than three cells).

134

Page 6: [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis - Fukushima, Japan (15-17 March 1995)] Proceedings the First Aizu International

I I

Figure 6: Another variant of the specification of one-dimensional array automata interaction.

The described solutions are based on the standard pipeline methodology - organizing a pipelined distri- bution of logical time synchro-waves. This methodol- ogy prevails at the architecture level in modern self- timing and asynchronous design. It was this stereo- type of thinking that prompted us the search direction and solution itself in [3]. Let us now try to overcome it and look at the problem of array synchronization without reference to any pipeline structures.

In Fig.7, a fragment of a signal graph unfolding is given for the case of parallel rather than wave synchro-

Figure 7: Signal graph unfolding fragment de- scribing parallel synchronization of a 1D array.

nization. Unlike Fig.3, all the parallel clock signals (clock signals of the same unfolding tier) correspond to the same, rather than consecutive, moments in log- ical time. Note that for a sufficiently large array (i.e. sufficiently wide signal graph) different logical time can exist at the same moment of physical time. Note also that in the case of parallel synchronization the algorithm is represented by equations (3) and (4).

To synthesize a circuit of the synchro-stratum, one should derive a correct signal graph adding extra vari- ables that resolve the contradictions in the initial spec- ification. It appears that there is a surprisingly simple solution presented in Fig.8. In this graph, signals fX;

Figure 8: Correct signal graph for parallel syn- chronization of a one-dimensional array.

and *tu; are the additional ones. The derived signal graph leads to an amazingly simple circuit presented in Fig.9.

135

Page 7: [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis - Fukushima, Japan (15-17 March 1995)] Proceedings the First Aizu International

Figure 9: Synchro-stratum for parallel synchronization of an 8-automata one-dimensional array.

Even a passing glance at Fig.3 and Fig.7 is enough to come to a conclusion which seems trivial but which was completely unexpected to us: the graphs difler only in the logical time marking. Hence, for the same synchm-stratum circuit, depending on the agreement on logical time mark-up (i.e. which equations define the algorithm), we obtain either a parallel or a wave system of global synchronization.

Now let us consider a synchro-stratum for a sys- tem of automata placed at the vertices of an arbitrary graph, the arcs of which correspond to the connections between the automata. With the accepted synchro- nization discipline, i.e. two synchronization systems T1j and Tzj, the connection graph should be a Kijnig graph (i.e. bichromatic, bipartite) [7].

The reduction of an arbitrary interconnection graph to a Kijnig graph is trivial; for example, it can be ef- fected by inserting buffer registers into all the con- nections. Note that this would not affect the gen- eral synchronization strategy since, as we mentioned above, the synchronization strategy itself is invariant to the semantics of the synchronized automaton be- havior. The insertion of buffer registers may be useful irrespective of the type of graph interconnection, espe- cially if the cycle of the processor operation is signifi- cantly longer than the write cycle of the buffer regis- ters. In such a case signals TI, = 1 initiate the activity of the processors and signals T2j = 1 initiate the write cycle of the buffer registers. A more sophisticated in- ternal organization of the processors can be used to achieve a higher level of concurrency. However, these problems are to do with the strategy of forming the acknowledgement signals and the methods of inter- nal processor synchronization, and so are beyond the scope of this paper. We would like to repeat that the approach we have chosen, i.e. decomposing a system

into the processor and synchronization strata, allows us to separate out the problem of global synchroniza- tion and that of processor behavior.

Let us return to Fig.9. What local properties of the circuit guarantee the correctness of the synchro- stratum behavior?

Firstly, in the layer of gates, the outputs of which are signals Tlj and Tzj, the connections between neighboring gates (through automata A j ) cause the transition of the output of a gate from state 0 to state 1 iff the outputs of all the neighbor gates are equal to 0.

Secondly, the transition of a gate with output Ti, from state 0 to state 1 should be deterministic. There- fore, memory is needed to hold information about the previous state; this is performed by two layers of gates with outputs X j and Yj. For these outputs to ap- pear no sooner than the neighbor lateral gates have switched to 0, gates X j and Y, are connected simi- larly to those between gates T;j.

Similar requirements on local interaction, aug- mented with the interaction with all the neigh- bors in the graph, allow one to build a cell of the synchro-stratum for an arbitrary Kijnig graph (Fig.10). Depending on the degree of graph vertices,

Figure 10: A cell to built a synchro-stratum for an arbitrary Kiinig interconnection graph.

136

Page 8: [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis - Fukushima, Japan (15-17 March 1995)] Proceedings the First Aizu International

the required number of inputs per gate may exceed the technological restrictions. Of course, in this case the cell circuit will have to change somewhat, but that can be done by standard methods of self-timed design, for example, using intermediate C-elements.

4 More precise definition of global syn- chronizat ion

Until now, we have assumed that the concept of global synchronization is intuitive. Now we shall try to define it more clearly. As an initial specification, we have a synchronous implementation with a two- phase master-slave synchronization system which con- sists of two clock signal sequences TI and T2 with a discipline (2). Such a synchronization system breaks down the set of the processors into two subsets: pro- cessors {Ai} clocked by TI and {&} clocked by Tz. For a synchronous implementation, the unfolding of a signal graph similar to the one displayed in Fig.2 is as follows:

+Ti(k) * {+a i (k ) } * -Ti(k) * { -a i (k ) } * +Tz(~) * { + b j ( k ) } * -Tz(k) * { - b j ( k ) } * +Tl(k + 1) * {+ai(k + 1)) * -Tl(k + 1) *

{-ai(k + 1)) * +T2(k + 1) + . . . . ( 6 )

Signals f a i ( k ) and f b j ( k ) signify the completion of transition processes in physical time in the corre- sponding processors.

The algorithm of system behavior is determined by automaton equations (3), (4) or by equations (5) that set the cause-and-effect relationship between the events in the automaton representing a processor and the events in its nearest neighbors in the interconnec- tion graph in logical time (i.e., partial order of events).

Let us denote the request signals for processors Ai and Bj as T1i and Tzj, respectively; the request sig- nals for the subset of processors Bj that are the nearest neighbors of processor A; in the interconnection graph as {T2j[Ai]); the request signals for the subset of pro- cessors Aj that are the nearest neighbors of processor Bj in the interconnection graph as {Tl;[Bj]} . Also, let a; and bj be the acknowledgement signals of processors A; and Bj respectively. We say that the signal graph of synchro-stratum behavior specifies global parallel synchronization of an asynchronous system that cor- responds to the synchronous prototype if the signal graph of the synchronous prototype is homomorphic to the signal graph of the synchro-stratum with respect to the mapping { f T l j ( k ) } * fTl(k),Z E {1,2}, and

if the signal graph of the synchro-stratum conforms to the relation of succession between the events in every processor and its closest neighbors that are defined by automata systems (3) and (4):

For wave synchronization defined by a system such as (5), at every point of the synchro-stratum the di- rection of synchro-wave propagation should be set. To do this, for every processor (Ai and B j ) , the set of its nearest neighbors in the interconnection graph should be broken down into two subsets - the sources of the synchro-wave front ( B s j ( A ; ) and A s ; ( B j ) ) and its receivers ( B r j ( A ; ) and Ari(Bj ) ) . Besides, a vertex of the synchro-stratum should be used as a rhythm driver. All the nearest neighbors of this vertex in the interconnection graph are receivers of the synchro- wave front'. Such a partitioning leads us to a four- color graph. It can be performed in various ways de- pending on which rhythm driver point is chosen. In homogeneous arrays, it can be performed rather easily, whereas an arbitrary interconnection graph requires collision detection and the partitioning may be rather sophisticated. Fig.11 presents examples of organiz- ing synchro-wave propagation in a two-dimensional ar- ray (Fig.ll,a) and where the interconnection graph is heterogeneous (Fig.ll,b). In any case, the synchro- stratum should provide the following order of the events:

and for the rhythm driver:

The problem of choice between wave and parallel synchronization strongly depends on the task in hand, although, in the general case, parallel synchronization seems preferable.

'Any cycle in the graph can be used as a source of synchro- waves.

137

Page 9: [IEEE Comput. Soc. Press the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis - Fukushima, Japan (15-17 March 1995)] Proceedings the First Aizu International

The results on global synchronization of asyn- chronous processor arrays and multiprocessor systems stated here correspond to the synchronous prototype with two-phase master-slave synchronization. Other possible synchronization disciplines may be of inter- est, too.

References

[I] R.F.Sproul1, I.E.Sutherland, C.E.Molnar, Coun- te$ow Pipeline Processor Architecture, Technical Report SMLI TR-94-25, Sun Micro-systems Lab- oratories, Inc., CA 94043, April 1994.

b)

[2] C.D.Nielsen, Delay-Insensitive Design with Con- stant Response Time, Technical Report ID-TR 1994-134, Technical University of Denmark, DK- 2800 Lyngby, Denmark, Jan. 1994.

[3] V.I.Varshavsky, T.-A.Chu, “Self-Timing - Tools for Hardware Support of Parallel, Concurrent and Event-Driven Process Control,” Proceedings of the Conference on Massively Parallel Computing Sys- tems (MPCS), May 1994, pp. 510-515.

array, b) in a heterogeneous graph.

Figure 11: Examples of organizing the propaga- tion of synchro-waves: a) in a two-dimensional

3-0 5 Conclusion

Strange as it might sound, when we had finished this work, we felt deep disappointment rather than satisfaction. Why should we have been racking our brains over the puzzle of organizing global synchro- nization of asynchronous arrays for several years, in- venting sophisticated examples and trying to find so- lutions to them, reading adverse reviews and planning

[4] V.Varshavsky, Hardware Support of Parallel Asyn- chronous Processes, Helsinki University of Tech- nology, Digital Systems Laboratory. Series A: Re- search Reports: No 2; Sept. 1987.

[5] V.Varshavsky, M.Kishinevsky, V.Marakhovsky et al., Self-timed Control of Concurrent Processes, Ed. by V.Varshavsky, Kluver Academic Publish- ers, 1990, (Russian Edition - 1986).

the work on this problems for some years into the fu- ture? A trivial solution was on the surface!

Of course, the issue is not closed. The approach suggested here, as well as any general solution, is good for all cases and bad in every particular one. For ex-

[ 61 V.Vars havsky, M .Kishinevsky, V . Mar akhovsky et al., “Asynchronous Distributer,” USSR Inventory Certificate No. 1064461, The Inventions Bulletin, No. 48, 1983.

ample, using this approach, one can implement a bit pipeline, but that will not be as good as the traditional implementation. Therefore, it is very important to un- derstand what question we have obtained the answer to.

Following this work, we can claim that for any pro- cessor array of any dimension, for any multiproces- sor system with a K6nig interconnection graph and for any distributed synchronous (in the sense of us- ing a clock, common for the whole system) algorithm or a system of algorithms, one can uniformly design a system of “asynchronous synchronization. ” To this end, it is suficient to use matched processors and a synchro-stratum of cells in Fig.10.

[7] D.Kiinig, Theorie der Endichen und Unendlichen Graphen, Leipzig, Akad. Verlag M.B.H., 1936, 258%; N.Y., Chelsea, 1950. Zbl, 15, 375.

138


Recommended