Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 226 times |
Download: | 3 times |
נכתב על ידי
027382977מאיר בכור
32033946אביתר שרעבי
Module
The module we are talking about is: computer with multiple processors but only one
memory unit. All the processors are synchronized using the same
clock. The processors are all connected to each other and
to the memory. If more then one processor writes the same value to
the same address in memory at the same time then the value will be written correctly. If the values are not the same then any value can be written.
Module
More then one processor can read the same memory address at the same time.
Other modules: The processors are on different computers. There is no sheared memory for all the
processors. The processors are not using the same clock.
Array Maximum Problem
On a computer with one processor: Time: O(N). Algorithm: Going over an array and keeping
the maximum.On a computer with K processors:
Time: O(N/K). Algorithm: Each processor handles N/K
elements from the array. And all the sum's of the parts of the array are summed together.
Array Maximum Problem
On a computer with O(N) processors. Time: O(log(N)). Algorithm: On the first stage every processor
will add 2 items. So after the first round will have N/2 numbers. On the next round N/4 processors each will take 2 numbers and sum them so we will have on ly N/4 result after the 2 round. After log(N) rounds we will have the sum of the array.
Array Maximum Problem
1 2 3 4 5 6 7 8
Example: 8 elements time 3 = Log(8).
Array Maximum Problem
The number of commutations that are performed is 7 (4 in the first round, 2 in the second and 1 in the last). This is the same number of computation that is being done in the serial algorithm but it’s being done in less time.
This Algorithm will work for a lot of other functions not just Max like Min, Sum, Avg.It will work for every Associative function.
Finding The Two Greatest Numbers
Simple solution for O(N) processors. Algorithm: Find the first maximum remove it from the
array and find the second. Time: 2 Log(N).
Smart algorithm for O(N) processors. Algorithm:
First round: each processor handles 2 items find the max and puts the other item in a.
Rounds 2..log(n): each processors handles 2 of the result of the second round compares the 2 Max values takes the Max as the new Max. and Takes the candidate group of the new max adds the max of the second group to it as the new candidate group.
Finding The Two Greatest Numbers
On The last round the Max of the array is the maximum and the second max is the maximum of the candidate group.
Sample:Array: 7, 10, 1, 3, 100, 8, 55, 6.
Finding The Two Greatest Numbers
7 10 1 3 100 8 55 6
10 3 100 55 7 1 8 6
10 100 7 8
3 55
100 8
55 10
Results: The maximum is the maximum of the array (100) and the second maximum is the maximum of the candidate group (55).
Finding The Two Greatest Numbers
Time:Log(N) + LogLog(N).Log(N) to find the first maximum and the
candidate group.LogLog(N) to find the maximum in the
candidate group.The candidate group size grows in 1 in
each round (the maximum of the other group) so at the end it’s size is Log(N).
Merge problem
Description: We have 2 sorted N size arrays B, C and we need to divide them into 2 new N sized arrays A1, A2 that the N largest items from both B and C will be in A1 and the N smallest will be in A2.
Simple solution: We can merge B and C into one sorted array A and copy the firs N elements to A1 and the last N elements to A2. But with this algorithm we can’t use multiple processors the cost will still be O(N).
Merge problem
Smart algorithm for O(N) processors. Processor I compares Bi with Cn+1-i the largest of
the two is going to A1 and the other to A2.
Correction proof. If Bi > Cn+1-i the Bi > B1..Bi-1 and Cn+1-i > C1..Cn-
iso Bi is larger then N elements (I - 1 from B and N - i + 1 from C) so Bi needs to be in A1.
If Cn+1-i > Bi then Cn+1-i is larger then N elements ( N - I from C and I from B ) so Cn+1-i needs to be in A1.
Merge problem
Example: B: 1, 8, 10, 17C: 9, 12, 67, 100(B1, Cn), (B2, Cn-1), (B3, Cn-2), (B4, Cn-3).A1 : 100, 67, 12, 17.A2 : 1, 8, 10, 9.
Time: We can do all the comparisons at the same time so the cost will be O(1).
Prefix Problem
Description: Find the sum of the elements group.S11 = X1S12 = X1 + X2
S1n = X1 + X2 +… Xn-1+XnSimple solution: Compute the sums with N
processors time O(NLogN) N sums where each one takes O(LogN).
Prefix Problem
Algorithm:for I = 0 to n-1 doip
Si = Xifor j = 0 to log n dofor I = 2^j to n-1 doip Si = Si + Si-2^jThe doip means do in parallel in the different
processor.At the end the results are in the array s.
Prefix Problem
Example: With 8 numbers X1..X8 Sij is Xi + Xi+1… + Xj.
X1 X2 X3 X4 X5 X6 X7 X8
S11 S12 S23 S34 S45 S56 S67 S78
S11 S12 S13 S14 S25 S36 S47 S58
S11 S12 S13 S14 S15 S16 S17 S18
Prefix Problem
Time:each round we get double the result S1i so after log(n) rounds we will get all the result.
In order to use this algorithm each processor needs to be connected to log(n) other processors.
Prefix Problem
Usage exampleProblem : we have an arithmetic expression and we need to test if the brackets arrangement is legal. Algorithm: we will create an array x by adding 1 for each “(“ and -1 for each “)”. And run the prefix algorithm. The results needs to be.S11 = 1 and S11..S1n-1>=0 and S1n = 0.Time with N processors : O(logN) log(N) for the prefix algorithm and O(1) for the test.
Partition Problem Description: We have and array X that some of it’s
element are signed we need to move all the signed elements to one array and the none signed to another array.
Simple solution: We take 2 stacks we push the signed into one stack and the none signed into the other stack. It will take o(N) time.
Simple solution 2: We take two indexes one for the start of the array and one to the end. The first search for signed and the second for none signed and when they both find they exchange the items they point to and move on until they meet. This will take o(N) time too but it’s more parallel.
Partition Problem
Smart algorithm for O(N) processors: Create a new array B but in be if the
element i is signed B[i] = 1 else B[i] = 0.
Create an array C with the prefix sums of B that is C[i] = B[1] + B[2] + … B[i].
If X[i] is signed then Y1[C[i]] = X[i]. If X[i] is not signed then Y2[i-C[i]] = X[i].
Partition Problem
Example: X = 2, 4, 7, 8, 1, 3, 10, 12, 15.
X = 2, 4, 7, 8, 1, 3, 10, 12, 15 B = 0, 1, 0, 0, 0, 1, 1, 0, 1 C = 0, 1, 1, 1, 1, 2, 3, 3, 4 Y1 = 4, 3, 10, 15 Y2 = 2, 7, 8, 1, 12
Partition Problem
Time with O(N) processor.Computing B: O(1).Computing C: O(log(n)) using the prefix algorithm.Computing Y1 and Y2: O(1).Total: O(log(n)).
Sorting AlgorithmDescription: Sorting array A using O(N^2)
processors and put the result into array C.Simple algorithm: The serial algorithm for
sorting an array takes a minimum of O(Nlog(N)) time.
Smart algorithm Create a matrix B size of N*N and initialize it
with zeroes at all cells. We will look at the N^2 processor as a matrix of
processors. Processor Pi,j will compute Ai>=Aj if true then B[i,j] =1.
Sorting Algorithm
For each i from 1 to N C[Sum(i)] = A[i]. When Sum(i) is the sum of B[i,1] to B[i,N].
Example: A=3, 5, 2, 9, 1Matrix B 1 2 3 4 5
1 1 0 1 0 1 2 1 1 1 0 1 3 0 0 1 0 1 4 1 1 1 1 0
5 0 0 0 0 1
Sorting Algorithm
C = 1, 2, 3, 5, 9. Time: Using O(N^2) processors finding B matrix will
take O(1) and finding C will cost O(log(N)).So the total cost of the algorithm will be
O(log(N)).Using O(N) processors finding B will take O(N)
time and finding C will take O(N) time so the total will be O(N).
Sorting Algorithm
Description: Sorting array A using O(N^2) processors and put the result into array C.
Algorithm: Merge sort the largest cost in the merge sort algorithm is the cost of the merge. Using a serial algorithm the cost of merging 2 sorted arrays is O(N) and the cost of the merge sort algorithm is O(Nlog(N)). We will use the regular algorithm but with a smarter merge algorithm.
Sorting Algorithm
Smart merge algorithm Description: We need to merge two sorted
arrays A, B to a sorted array R. Algorithm: We will describe a recursive
algorithm Merge.C=merge(even(A), odd(B)).D=merge(odd(A), even(B)).Where odd(A) is all the items in A with an Odd index. And Even(A) is all the items in A with an even index.
Sorting Algorithm
When C = C0, C1, C2….Cn D = D0, D1, D2….DnE=C0, D0, C1, D1…Cn, Dn.Compare each Ci,Di and if Ci>Di then replace Ci and Di in array E.And array E is the merger of C and D.
Sorting Algorithm Example: A = 3, 5, 8, 10
B = 4, 7, 9, 12Even(A) = 5 ,10 Odd(A) = 3, 8Even(B) = 7, 12 Odd(B) = 4, 9C = 3, 7, 8, 12D = 4, 5, 9, 10E = 3, 4, 7, 5, 8, 9, 12, 10After replacing in EE = 3, 4, 5, 7, 8, 9, 10, 12
Time: Using O(N) processors the merge will take O(log(N)) time The merge sort runs the merge algorithm log(N) times so the total cost of the merge sort is O(log^2(N)).
Find Algorithm
Description: If array X contains the value Val the Res needs to be True else Res needs to be False.
Simple Algorithm: Using a serial algorithm it will take O(N) time.
Smart Algorithm: Using O(N) processor. Res = False. Each process i tests if X[I] = Val if true Res = True.
Time: O(1).
Model Description
Many processors.Processors can send messages to
each other through communication.We will want that each processor will
have a unique identification.Since we have O(n) processors we
need O(logn) bit to represent the Id.
Model Description
Clean Net: when a processor doesn’t now anything about his neighbors, not even their Id’s. he only knows how many neighbors he have.
We will explicitly mention when dealing with Clean Net, otherwise every processor has a unique Id.
Model Description
Message should include sender and receiver Id and some information - total O(logn) bits.
If X wants to send message to Y through Z, it will cost 2 steps to send the message.
X Z Y
Model Description
Local computation doesn’t take time.
we will analyze:time complexity - the number of steps the algorithm takes in the worst case.communication complexity - the total number of messages that we sent in the execution of the algorithm in the worst case.
Distributed vs. Sequential
Communication - we need in the distributed model but not in the sequential.
Partial knowledge - together all the processor knows everything, but not all the processors necessarily knows everything.
There can be processors or communication channels down.
Distributed vs. Sequential
Synchronization - we need to synchronize the processor.
Synchronic Model
there is a global clock. In any clock cycle each of the
processor- send messages to his neighbors.- receive messages from his neighbors.- make local computation in 0 time.- change state.
Asynchronies Model
There is no global clock.if a message was sent it will
eventually arrive to its destination (with no fall downs) but we can't assume anything about the arrival time.
we will start the time from the beginning of the execution until the last processor stooped.
Asynchronies Model
We will force the assumption that any of the messages arrived in one time unit in the worst case for time complexity calculations.
Model Representation
We can represent the processors net with a graph.
Each node in the graph is a processor.
There is an edge between two nodes if there is a direct communication channel between the two processors they represent.
Complexity
C(, G, I) - communication complexity:the total number of messages that were sent in the execution in the worst case.
T(, G, I) - time complexity:the number of clock cycles that the execution take in the worst case.
Where is the protocol, G is the graph and I is the input.
Complexity - examples
The following examples are in a full graph.
n
21
Complexity - example 1
Protocol A: node 1 send the message m to node 2.
C(A, G, I) = 1.T(A, G, I) = 1.
1 2m
Complexity - example 2
Protocol B: node 1 send the message mi to the node i.
C(B, G, I) = n.T(B, G, I) = 1.
1 imi
iG
Complexity - example 3
Protocol C: node i send the message mi to node i+1.
C(C, G, I) = n.T(C, G, I) = 1.
i i+1mi
iG
Complexity - example 4
Protocol D: node i send the message m to node i+1 in cycle i.
C(D, G, I) = n.T(D, G, I) = n.
1m
2
2m
3
.
.
.
Transmission Problem
Input: there is a message m in the node V0.
Output: the message m is written in all the nodes in the graph.
dG(x,y) - the shortest path from x to y in graph G.
D = Diameter(G) = max x,yV { dG(x,y) }.
Algorithms for the Transmission ProblemDirect Delivery.Spanning Tree.DFS.Flooding.
Direct Delivery
Bases on the assumptions:- there is a routing system, such as that messages are sent in the shortest path.- V0 knows the addresses of all other nodes in the graph.
V0 send the message m n-1 times, each time to a different node.
DD Communication ComplexityV0 sends n messages.It takes O(D) steps for each
massages.C(DD, G, I) = O(n*D).
DD Time Complexity
Under the assumptions:1. synchronic model.2. V0 sends one new message in any clock cycle.
There won’t be collisions between messages, because messages goes in the shortest path, and therefore we can’t have more then one message for a given distance from V0.
DD Time Complexity
The last messages will be sent in the n-1 cycle.
It will take O(D) steps for the last message to arrive.
T(DD, G, I) = O( n+D ).
DD Time Complexity
We can show the same time complexity even without assumption 2.
If we will have two messages in a node competing for the same edge. We will send the message that should arrive to the node with the smaller Id.
the message for node i, in time t, must be in a distance t-i+1 from V0 (or in Vi).
Spanning Tree
Assumptions:We have a spanning tree in the graph, that all the node aware off (each node knows which of his edges is part of the spanning tree).
Each node that receive the message send it on the spanning tree edges.
Spanning Tree Complexity
We send the message once for each spanning tree edge.
C(ST) = n-1.We need tree depth rounds until
the last node receive the message.T(ST) = O( Depth( tree, V0 ) ).If we choose a BFS tree: T(ST) =
O(D).
Building a Spanning Tree
If we don’t have a spanning tree, we can built one using any algorithm A for Transmission.
Execute algorithm A.each node V choose as a parent
the node W from which it received the message for the first time.
Building a Spanning Tree
V inform W that he is his parent.The edge E(W,V) is marked as a
spanning tree edge.Since transmission algorithm
deliver the message to all nodes, we know that all the nodes are in the spanning tree.
We have no cycles since V choose only one parent.
DFS
We traverse the graph in DFS order.
If we reached a new node we leave a copy of the message, mark the node and continue the traversal.
If we reached a marked node we go back.
DFS Complexity
In the DFS algorithm we move on each edge exactly twice.
C(DFS) = T(DFS) = O(E).
Flooding
Each node that receive the message for the first time, sent it to all of his neighbors.
When a node receive a message in the next times, it just dump the message.
Flooding is affective also in a Clean Net.
Flooding Complexity
In each edge the message will pass twice, once in each direction.
C(Flood) = O(E).After t time unit the message will
reach all the nodes that their distance from V0 is smaller or equal to t.
T(Flood) = O(D).