+ All Categories
Home > Documents > Jason HandUber by Presentation at UCF Systolic...

Jason HandUber by Presentation at UCF Systolic...

Date post: 13-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
27
Systolic Arrays Systolic Arrays Presentation at UCF Presentation at UCF by by Jason HandUber Jason HandUber February 12, 2003 February 12, 2003
Transcript
Page 1: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Systolic ArraysSystolic ArraysPresentation at UCFPresentation at UCF

bybyJason HandUberJason HandUber

February 12, 2003February 12, 2003

Page 2: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Presentation OverviewPresentation Overview�� IntroductionIntroduction

•• Abstract Intro to Systolic ArraysAbstract Intro to Systolic Arrays•• Importance of Systolic ArraysImportance of Systolic Arrays•• Necessary Review – VLSI, definitions, matrixNecessary Review – VLSI, definitions, matrix

multiplicationmultiplication

�� Systolic ArraysSystolic Arrays•• Hardware & Network InterconnectionsHardware & Network Interconnections•• Matrix-Vector MultiplicationMatrix-Vector Multiplication•• Beyond M.V. MultiplicationBeyond M.V. Multiplication•• Applications and Extensions of covered topicsApplications and Extensions of covered topics

�� SummarySummary

Page 3: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Introduction - ScenarioIntroduction - Scenario�� Your boss approaches you at work andYour boss approaches you at work and

notifies you that the company has a chancenotifies you that the company has a chanceat landing an obscenely lucrative governmentat landing an obscenely lucrative governmentcontract.contract.

�� He asks you to put together a proposal andHe asks you to put together a proposal andindicates that for you to keep making theindicates that for you to keep making the$130,000 / year that you make you should be$130,000 / year that you make you should beable to secure the contract.able to secure the contract.

�� Lastly, he informs you that the governmentLastly, he informs you that the governmentcontract is concerned with one of the topicscontract is concerned with one of the topicson the following slide, that you haveon the following slide, that you haveessentially limitless funding, and that theessentially limitless funding, and that thecontract specifies that the final run-time ofcontract specifies that the final run-time ofthe algorithm must be linear.the algorithm must be linear.

Page 4: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...
Page 5: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Introduction – “Why?”Introduction – “Why?”

�� What is the main commercial point ofWhat is the main commercial point ofComputer Architecture?Computer Architecture?

�� To that end, what are two main points To that end, what are two main points computer architects have been focusing computer architects have been focusingon in recent years?on in recent years?

Essentially Essentially �� Moore’s Law Moore’s Law

Pipelining & ParallelismPipelining & Parallelism

Page 6: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Moore’s Moore’s LawLaw

Page 7: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Introduction – Pipelining & ParallelismIntroduction – Pipelining & Parallelism

Processor PipeliningProcessor Pipelining

Ideally at least Ideally at least one new one new instruction instruction completes every completes every time cycle. time cycle.

ParallelismParallelism

Multiple jobs Multiple jobs are allowed to are allowed to perform perform simultaneously simultaneously

Page 8: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Introduction – “Systolic”Introduction – “Systolic”

Page 9: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Introduction – Systolic DefinitionIntroduction – Systolic Definition

�� “Imagine n simple processors“Imagine n simple processorsarranged in a row or an array andarranged in a row or an array andconnected in such a manner thatconnected in such a manner thateach processor may exchangeeach processor may exchangeinformation with only its neighbors toinformation with only its neighbors tothe right and left. The processors atthe right and left. The processors ateither end of the row are used foreither end of the row are used forinput and output. Such a machineinput and output. Such a machineconstitutes the simplest example of aconstitutes the simplest example of asystolic array.”[1]systolic array.”[1]

Page 10: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Introduction – Systolic Definition (2)Introduction – Systolic Definition (2)

�� “Systolic Arrays are regular arrays of“Systolic Arrays are regular arrays ofsimple finite state machines, wheresimple finite state machines, whereeach finite state machine in the arrayeach finite state machine in the arrayis identical…A systolic algorithmis identical…A systolic algorithmrelies on data from differentrelies on data from differentdirections arriving at cells in thedirections arriving at cells in thearray at regular intervals and beingarray at regular intervals and beingcombined.” [2]combined.” [2]

Page 11: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Systolic ArraysSystolic Arrays

�� “By pipelining, processing may“By pipelining, processing mayproceed concurrently with inputproceed concurrently with inputand output, and consequentlyand output, and consequentlyoverall execution time is minimized.overall execution time is minimized.Pipelining plus multiprocessing atPipelining plus multiprocessing ateach stage of a pipeline shouldeach stage of a pipeline shouldlead to the lead to the best-possiblebest-possibleperformance.”[3]performance.”[3]

Page 12: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Introduction – Review - VLSIIntroduction – Review - VLSI

�� VLSI – Very Large Scale IntegrationVLSI – Very Large Scale Integration�� VLSI is low-cost, high-density, high-VLSI is low-cost, high-density, high-

speed.speed.�� “VLSI technology is especially“VLSI technology is especially

suitable for designs which are suitable for designs which are regular,regular,repeatable, repeatable, and with highand with high localized localizedcommunications.”communications.”

�� “A systolic array is a design style for“A systolic array is a design style forVLSI.” [3]VLSI.” [3]

Page 13: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Introduction – Review - MatrixIntroduction – Review - MatrixMultiplicationMultiplication

�� Consider multiplying a 3x2 X 2x1 matrix:Consider multiplying a 3x2 X 2x1 matrix:

Page 14: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Introduction – ReviewIntroduction – Review

Systolic Cell – basic workhorseSystolic Cell – basic workhorse(processor) of a systolic array.(processor) of a systolic array.•• Few Fast RegistersFew Fast Registers•• ALUALU•• Simple I/OSimple I/O

�� Multiple CPUs on one machineMultiple CPUs on one machine�� Parallel ExecutionParallel Execution

Page 15: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Systolic AdvantagesSystolic AdvantagesHow they workHow they work

A systolic array has multiple cells networkedA systolic array has multiple cells networkedtogether to form an array.together to form an array.

Speed – register to register transfer of data.Speed – register to register transfer of data.Data is not destroyed until it has beenData is not destroyed until it has beencompletely used.completely used.

Synchronization – All cells run off of aSynchronization – All cells run off of acentral clock.central clock.

Host Data Entry – All cells (includingHost Data Entry – All cells (includingboundary cells) are I/O capable.boundary cells) are I/O capable.

Page 16: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Example of Linear SystolicExample of Linear SystolicArrayArray

�� Breakdown of data into 3 partsBreakdown of data into 3 parts•• Input matrix 1Input matrix 1•• Input Matrix 2Input Matrix 2•• Output matrixOutput matrix

�� What are the different parts to anWhat are the different parts to anarray?array?

�� What is bandwidth?What is bandwidth?

Page 17: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Systolic ArraysSystolic Arrays

Page 18: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

T0

T1

T2

T3

T4

T5

T6

T7

Y values goes left, X values go right, A values fan in

Page 19: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Systolic ArraysSystolic ArraysLinear Systolic ArraysLinear Systolic Arrays

• PIPELINING• Multiple CPUs Pipelined Together• Basic Architecture• Speed Up

• O(wn) � Exponential• 2n + w � Linear!!

Page 20: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Systolic ArraysSystolic Arrays

�� The concepts used in Matrix-VectorThe concepts used in Matrix-Vectormultiplication can be easily extendedmultiplication can be easily extendedto compute more complex functions.to compute more complex functions.•• Some of these functions wereSome of these functions were

introduced in the introduction during theintroduced in the introduction during theflash presentation and include theflash presentation and include themultiplication of multiple matrices andmultiplication of multiple matrices andn-dimensional applications.n-dimensional applications.

Page 21: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Systolic ArraysSystolic Arrays

Page 22: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Pipelining Vs. Systolic ArrayPipelining Vs. Systolic Array

�� Input data is not consumedInput data is not consumed�� Input data streams can flow inInput data streams can flow in

different directionsdifferent directions�� Modules may be organized in a twoModules may be organized in a two

dimensional (or higher) configurationdimensional (or higher) configuration�� Configurable – Different arrayConfigurable – Different array

configurations available for differentconfigurations available for differentprocessing purposes.processing purposes.

Page 23: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Systolic Advantages andSystolic Advantages andHow they work.How they work.

�� Systolic Array – A network of systolicSystolic Array – A network of systolicCells.Cells.•• Systolic Cell – An independent operatingSystolic Cell – An independent operating

environment with processor, registers andenvironment with processor, registers andALU.ALU.

�� Scalable – Easily extend the architectureScalable – Easily extend the architectureto many more processors.to many more processors.

�� Capable of supporting SIMD organizationsCapable of supporting SIMD organizationsfor vector operations and MIMD for non-for vector operations and MIMD for non-homogeneous parallelism.homogeneous parallelism.

�� Allow extremely high throughput w/multi-Allow extremely high throughput w/multi-dimensional arrays.dimensional arrays.

Page 24: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Systolic DisadvantagesSystolic Disadvantages

�� Complicated – Both in Hardware andComplicated – Both in Hardware andSoftware.Software.•• In fact entire volumes exist outliningIn fact entire volumes exist outlining

systolic array verification.systolic array verification.

�� Expensive in comparison to Expensive in comparison to uniuni--processor systems, although muchprocessor systems, although muchfaster.faster.

Page 25: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Presentation SummaryPresentation Summary

�� Systolic Arrays offer a way to takeSystolic Arrays offer a way to takecertain exponential algorithms andcertain exponential algorithms anduse hardware to make them linear.use hardware to make them linear.

�� They are expensive and complexThey are expensive and complexbut yield enormous throughput.but yield enormous throughput.

�� Any Questions?Any Questions?

Page 26: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

ReferencesReferences[1] [1] BayoumiBayoumi, , MagdyMagdy. Ling, Nam. Specification and Verification of Systolic. Ling, Nam. Specification and Verification of Systolic

Arrays. World Scientific Publishing Co. Arrays. World Scientific Publishing Co. PtePte. Ltd. Singapore. 1999.. Ltd. Singapore. 1999.[2] Brown, Andrew. VLSI Circuits and Systems in Silicon. McGraw-Hill Book[2] Brown, Andrew. VLSI Circuits and Systems in Silicon. McGraw-Hill Book

Company. London. 1991.Company. London. 1991.[3] Dewdney, A.K. The (New) Turing Omnibus. Henry Holt and Company.[3] Dewdney, A.K. The (New) Turing Omnibus. Henry Holt and Company.

New York. 1993.New York. 1993.[4] Fisher, J. “Very long instruction word architectures and the ELI-512”. In[4] Fisher, J. “Very long instruction word architectures and the ELI-512”. In

proc. 10proc. 10thth International Symposium on Computer Architecture. June 1983, International Symposium on Computer Architecture. June 1983,pp. 140-150.pp. 140-150.

[5] Hennessey, J. Patterson, D. Computer Architecture: A Quantitative[5] Hennessey, J. Patterson, D. Computer Architecture: A QuantitativeApproach. MorganApproach. Morgan Kaufmann Kaufmann Publishers. San Francisco. Publishers. San Francisco.

[6] Kung, H.T. [6] Kung, H.T. LeisersonLeiserson, C.E. “Algorithms for VLSI processor arrays.” C., C.E. “Algorithms for VLSI processor arrays.” C.Mead and L. Conway, editors, Introduction to VLSI Systems, Ch. 8.3.Mead and L. Conway, editors, Introduction to VLSI Systems, Ch. 8.3.Addison-Wesley. 1980.Addison-Wesley. 1980.

[7] [7] McCannyMcCanny, John., John. McWhirter McWhirter, John., John. Swartzlander Swartzlander, Earl. Systolic Arrays, Earl. Systolic ArraysProcessors. Prentice Hall International Ltd.Processors. Prentice Hall International Ltd. Hertfordshire Hertfordshire. 1989.. 1989.

[8] Moore, Will. McCabe, Andrew.[8] Moore, Will. McCabe, Andrew. Urquhart Urquhart,, Roddy Roddy. Systolic Arrays.. Systolic Arrays. J W J W Arrowsmith Arrowsmith Ltd. Bristol. 1987. Ltd. Bristol. 1987.

Page 27: Jason HandUber by Presentation at UCF Systolic Arraysweb.cecs.pdx.edu/~mperkows/temp/May22/jhanduber2.pdfY values goes left, X values go right, A values fan in. Systolic Arrays ...

Post-Presentation ThoughtsPost-Presentation Thoughts The comment made in class that it is difficult forThe comment made in class that it is difficult for

the programmer to know when to implementthe programmer to know when to implementparallelism isn’t really a fair statement. Certainparallelism isn’t really a fair statement. Certainnewer architectures and compilers determine allnewer architectures and compilers determine allcode dependencies and make it easy for a CPUcode dependencies and make it easy for a CPUscheduler to efficiently divide such tasks withscheduler to efficiently divide such tasks withhelp from the code itself during compile-time. Forhelp from the code itself during compile-time. Formore information on some of these “newer”more information on some of these “newer”architectures look up VLIW (Very Longarchitectures look up VLIW (Very LongInstruction Words) to see how smart compilers inInstruction Words) to see how smart compilers inconjunction with specific schedulers can packageconjunction with specific schedulers can packageinstruction words so that they easily take fullinstruction words so that they easily take fulladvantage of multiple processors with minimal oradvantage of multiple processors with minimal orno delay to compute the division of such tasks.no delay to compute the division of such tasks.


Recommended