Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | darren-owens |
View: | 212 times |
Download: | 0 times |
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
CS8625-June-22-2006
ClassWill
Start Momentarily…
Homework & Midterm ReviewCS8625 High Performance and
Parallel ComputingDr. Ken Hoganson
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Balance Point
• The basis for the argument against “putting all your (speedup) eggs in one basket”: Amdahl’s Law
• Note the balance point in the denominator where both parts are equal.
• Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.
N
Speedup
1
1
N
1 wherePoint, Balance
N
N
increasing through
possible is speedup additional
little very ,1 When
N
N
increasing through
possible bemay speedup additional
tsignifican ,1 When
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Balance Point Heuristic
• Increasing N (number of processors) beyond this point can at best halve the denominator, and double the speedup.
N
Speedup
1
1
N
1 wherePoint, Balance
N
N
increasing through
possible is speedup additional
little very ,1 When
N
N
increasing through
possible bemay speedup additional
tsignifican ,1 When
Solved for N N= α --------
1-α
Solved for α α= N --------
N + 1
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Balance Point
• Example• Parallel Fraction =
90%• (10% in serial)
N Alpha/N 1-alpha Speedup
1 0.90 0.10 1/1
2 0.45 0.10 1/(0.1+0.45) = 1.82
4 0.225 0.10 1/(0.1+0.225)= 3.07
8 0.1125 0.10 1/(0.1+0.1125)= 4.716
16 0.056 0.10 1/(0.1+0.056)= 6.41
32 0.028 0.10 1/(0.1+0.028)= 7.8125
64 0.014 0.10 1/(0.1+0.014)= 8.77
infinity 0.0 0.10 1/(0.1+0.0)= 10
Solved for N N= α --------
1-αN=0.90/0.10=9, Sup=5
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Example
• Example: Workload has an average alpha of 94%. How many processors can reasonably be applied to speedup this workload?
Solved for N N= α --------
1-α
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Example
• Example: An architecture has 32 processors. What workload parallel fraction is the minimum need to make reasonably efficient use of the processors?
Solved for α α= N --------
N + 1
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Multi-Bus Multiprocessors
• Shared-Memory Multiprocessors are very fast– Low latency to memory on bus– Low communication overhead through shared-
memory• Scalability problems
– Length of bus slows signals (.75 SOL)– Contention for the bus reduces performance– Requires Cache to reduce contention
CPU CPUCPU MEM
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Bus Contention
Multiple devices – processors, etc, compete for access to a bus
Only one device can use a bus at a time, limiting performance and scalability
)1(
)1()1(
)1()1(1
blocked) isrequest oneleast (at request oneor zero thanmore ofy probabilit
)1()1(!1)!1(
!
bus erequest th willprocessor oneexactly y that probabilit
)1(1 bus erequest th willoneleast at y that probabilit
)1( bus erequest th willnoney that probabilit
bus a requestingnot ofy probabilit 1
processors ofnumber
bus a requestingprocessor a ofy probabilit
nn
nn
n
n
rnrr
rnrrrn
n
r
r
r
n
r
1 – zero requests – exactly one request = probability of 2 or more (at least one blocked request)
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
• Performance degrades as requests are blocked• Resubmitted blocked requests degrades
performance even further than that shown above
N=4 N=8 N=16
R 0.1 0.1 0.2
1-r 0.9 0.9 0.8
(1-r)^n 0.6561 0.430 0.028
Nr(1-r)^(n-1) 0.2916 0.3826 0.1126
Blocked 0.0523 0.1873 0.8594
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Clearly, the probability that a processor’s access to a shared bus will be denied will increase with both:
• The number of processors sharing a bus• The probability a processor will need access
to the bus.
• What can be done? What is the “universal band-aid” for performance problems?
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
• If cache greatly reduces access to mem, then
• Blocking rate on the bus is much lower.
N=4 N=8 N=16 N=16
R 0.1 0.1 0.2 0.01
1-r 0.9 0.9 0.8 0.99
(1-r)^n 0.6561 0.430 0.028 0.8515
Nr(1-r)^(n-1) 0.2916 0.3826
0.1126 0.1376
Blocked 0.0523 0.1873
0.8594 0.0109
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Two approaches to improving shared memory/bus machine performance:
• Invest in large amounts, and multiple levels of, cache, – and a connection network to allow caches
to synchronize contents.
• Invest in multiple buses and independently accessible blocks of memory
• Combining both may be the best strategy.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Homework
• Your project is to explore the effect on the performance of a shared-memory bus-based multiprocessor, of interconnection network contention.
• You will do some calculations, use the HPPAS simulator, and write a couple-page report to turn in.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Task 1
• For a machine with processors that include on-chip cache that yield a cache hit rate of 90%, determine the maximum number of processors that can go on a single shared-bus, and still maintain at least a 98% acceptance of requests.
• Use the calculations shown in the lecture to zero in on the correct answer, recording your calculations in a table for your report. Show each step of the calculation as was done in the lecture/ppt.
• Your results should “bracket” the maximum.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Task 1
• Task 1: Use the formula in the table to find
N=4 N=8 N=16 N=? N=?
R=10% 0.10 0.10 0.10 0.10 0.10
1-r 0.90 0.90 0.90 0.90 0.90
(1-r)^n
Nr(1-r)^(n-1)
Blocked 1 - 0Req - 1Req
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Task 2
• Use the maximum number of processors (Task 1) and Amdahl’s law at the balance point, to figure out what workload parallel fraction yields a balance in the denominator.
• Determine the theoretical speedup that will be obtained.
Solved for α α= N --------
N + 1
N
Speedup
1
1
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Task 3
• Use the data values developed so far, to run the HPPAS simulation system. Record the speedup obtained from this system.
• If it differs markedly from the theoretical value, check all the settings, and rerun the simulation, and explain any variation from the theoretical expected value.
• Record your results in your report, showing each step of the calculation as was done in the lecture/ppt.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Dates
• The current plan:• Make the midterm available on Friday June
23.• Due date will be July 10 (after the conference
and after the July 4th weekend).
• Conference week: • Complete homework: Due on July 3 by email.• Work on Midterm exam.
• No class lecture on June 27 and 29.• No class on July 4.• Next live class is Wed July 6.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
Topic Overview
Overview of topics for the exam:• Five parallel levels• Problems to be solved for parallelism• Limitations to parallel speedup• Amdahl’s Law: theory, implications• Limiting factors in realizing parallel performance• Pipelines and their performance issues• Flynn’s classification• SIMD architectures• SIMD algorithms• Elementary analysis of algorithms• MIMD: Multiprocessors and Multicomputers• Balance point and heuristic (from Amdahl’s Law)• Bus contention and analysis of single shared bus.• Use of the online HPPAS tool.• Specific multiprocessor clustered architectures:
– Compaq– DASH– Dell Blade Cluster
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
End of Lecture
End Of
Today’s
Lecture.
CS 8625 High Performance and Parallel, Dr. Hoganson
Copyright © 2005, 2006 Dr. Ken Hoganson
This slide left intentionally blank.