+ All Categories
Home > Documents > Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf ·...

Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf ·...

Date post: 25-Mar-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
26
CISC 879 : Software Support for Multicore Architectures Presented By: Kanik Sem Dept of Computer & Information Sciences University of Delaware Porting Financial Market Applications to the Cell Broadband Engine Architecture John Easton, Ingo Meents, Olaf Stephen, Horst Zisgen, Sei Kato
Transcript
Page 1: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Presented By: Kanik SemDept of Computer & Information Sciences

University of Delaware

Porting Financial Market Applications tothe Cell Broadband Engine Architecture

John Easton, Ingo Meents, Olaf Stephen, Horst Zisgen, Sei Kato

Page 2: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Outline

• Why Cell B.E. for financial markets?• Porting strategies for the Cell B.E. platform• Performance results• Mixed-precision workloads• Tying it all together• Conclusions

Page 3: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Why Cell B.E. for financial markets?

• Potential for dramatic impact on financialapplications

• Application codes ported to the Cell• Optimized codes to fully exploit Cell• Performance improvements of almost 40x

Page 4: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

A description of the application

• Code used to price a European Option.• Model based on Monte Carlo simulation technique.• Need to generate a large number (200,000,000 in this case) of

uniform, pseudo-random numbers.• Using the random numbers generated, execute the financial

model.

Page 5: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Porting strategies for Cell

• Recompilation of existing code for Cell• XLC better than gcc

• Make some structural changes• Framework to start separate threads on each SPU.• Splitting RNG across all cores.

• Make functional changes to the code.• Re-engineered functions to exploit vectorization on SPU cores.

Page 6: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Analysis of the original code

%time Seconds Calls Function name 62.70 118.32 200000000 getRandom() 37.18 70.16 1 simulateEuropeanOptionValue()

0.14 0.27 1 hpcMonteCarlo::random() 0.00 0.00 2 hpcBlackScholes()

SDK for Cell provides optimized RNG.Can generate 64 number generators at once on Cellblade.Use gettimeofday() function.

Page 7: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Initial performance results

To run the performance tests, the following parameterswere used :

• Compiler used: spuxlc, ppuxlc

• Compiler optimization setting: -03 –qstrict

• Random-number generation method: sdk

• Precision: single

• Number of evaluations: 200,000,000

Page 8: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Initial performance results

Performance by number of SPUs (single precision)

Number of SPUs

Elapsed time (seconds)2.4 GHz

Cell/B.E. processor

(measured)

Elapsed time (seconds)3.2 GHz

Cell/B.E. processor

(estimated)

Speedup

1 65.7 49.27 1 2 32.9 24.6 1.99

3 21.9 16.42 3 4 16.4 12.3 4

5 13.18 9.88 4.98

6 10.9 8.17 6.02 7 9.4 7.05 6.98

8 8.2 6.15 8.01 9 7.3 5.4 9

10 6.6 4.95 9.95 11 6 4.5 10.95

12 5.5 4.12 11.94

13 5.1 3.8 12.88 14 4.7 3.52 13.97

15 4.4 3.3 14.93 16 4.1 3.07 16.02

Page 9: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Initial performance results

Page 10: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Double Precision

Organizations in financial markets require double-precisioncalculations.

Initial target marketplace for Cell does not need this.

Initial implementation of Cell provides limited double-precisionsupport in hardware

Single-precision Fully pipelinedDouble-precision Partially pipelined

Page 11: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Performance results

Performance by number of SPUs (double -precision)

Number of SPUs Elapsed time

(seconds)2.4 GHz Cell/B.E.

processor

(measured)

Elapsed time

(seconds)3.2 GHz Cell/B.E.

processor

(estimated)

Speedup

1 157.3 117.9 1

2 78.6 58.9 2

3 52.4 39.3 3

4 39.3 29.47 4

5 31.49 23.61 4.99

6 26.25 19.68 5.99

7 22.5 16.8 6.99

8 19.7 14.7 7.98

9 17.5 13.12 8.98

10 15.78 11.8 9.96

11 14.3 10.7 11

12 13.1 9.82 12

13 12.1 9.1 13

14 11.3 8.47 13.92

15 10.5 7.87 14.98

16 9.9 7.42 15.89

Page 12: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mersenne-Twister

• Run time with Mersenne-Twister (without optimization): 5 sec• Run time with the Cell/B.E. SDK: 4.1 sec

Mechanisms to improve the performance still further :Optimize Mersenne-Twister code for threading framework.Rewrite the code to utilize the SIMD capabilities of SPUs.

Performance comparison between Cell/B.E. SDK and Mersenne -Twister random -number generators

Precision Runtime

(seconds) SDK

RNG (2.4Ghz)

Runtime

(seconds)

Mersenne -Twister RNG (2.4

GHz)

Runtime

(seconds)

Mersenne -Twister RNG 3.2

GHz (estimated)

Single 4.1 1.02 0.76

Double 9.9 2.47 1.85

Page 13: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloadsMixed-Precision:Only those parts that actually need double-precision arecalculated using double-precision.

Disadvantage:Makes for a slight increase in the programming effortneeded

Identify parts of code which use this sort of precision Make the appropriate changes to the code.

Advantage: Performance improvement.

Page 14: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloads

The two methods of applying mixed-precision to our codeare:

(1) Concatenating two single-precision random variables.

(2) Generate one single-precision random variable and thendoing a double-precision division.

Page 15: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloads

# SPU CC_DP_MT CC_DP_SDK M_DP_MT SP_MT SP_SDK

1 40.33 40.33 45.76 12.01 11.16

2 20.33 20.33 22.88 6.06 5.70

3 13.56 13.56 15.26 4.05 3.80

4 10.17 10.17 11.44 3.04 2.85

5 8.13 8.13 9.16 2.43 2.29

6 6.78 6.78 7.64 2.03 1.91

7 5.82 5.82 6.55 1.75 1.64

8 5.09 5.09 5.75 1.53 1.44

9 4.53 4.52 5.11 1.36 1.28

10 4.08 4.08 4.60 1.22 1.15

11 3.70 3.70 4.18 1.11 1.05

12 3.40 3.39 3.84 1.02 0.96

13 3.14 3.14 3.54 0.94 0.89

14 2.92 2.92 3.29 0.88 0.83

15 2.72 2.72 3.07 0.82 0.78

16 2.52 2.53 2.88 0.77 0.73

• CC_DP_MT = Concatenation Double-Precision Mersenne-Twister• CC_DP_SDK = Concatenation Double-Precision SDK• M_DP_MT = Division Double-Precision Mersenne-Twister• SP_MT = Single-Precision Mersenne-Twister• SP_SDK = Single-Precision SDK

Page 16: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloads

Page 17: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloads

Additional optimization techniques :

• Unrolling more parts of Mersenne-Twister RNG.

• Additional software pipelining by parallelizing computation.

• Introducing new variables to eliminate dependencies.

• Pre-calculating some items:a[0]=<something>;for (i=0;i<N;i++){sinf4(a[0]) ;sinf4(a[i+1));......}

Page 18: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Intel optimizations

• A “master” thread forks “slave” threads to perform RNG.• “master” thread part of the Cell/B.E. code that runs on PPU• “slave” threads parts that run on the SPUs.

Difference:• Work scheduled by the OpenMP runtime shares same cores as the

OS threads.• The SPUs on the Cell/B.E. version are not running the operating

system. This enables them to be used entirely to run the applicationcode.

Page 19: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Intel optimizations

System/CPU Operating System Compiler No. of Threads (Cores)

Speed (GHz) 1 2 4 8

x3550/3.0 Red Hat Linux Intel ICPC 31.76 15.9 8.46 -

x336 / 2.8 Red Hat Linux Intel ICPC 43.27 30.02 22.62 -

HS21 / 2.33 Fedora Core 6 gcc 43.38 21.74 10.88 8.26

Page 20: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Tying it all together

Page 21: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Future Work

Results achieved so far are on a system that many viewas being unsuitable for Financial Markets users.

• “Enhanced Double-Precision” version of the CellBroadband Engine technology.

• Systems based on Cell/B.E. technology are an excellentplatform for Financial Markets applications.

Page 22: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Getting the most performance out ofCell/B.E. technology

Offload as much of the computation onto the SPUs aspossible.

Write the SIMD code yourself rather than relying on thecompiler to do it.XLC provides “auto-SIMDize”This may not be a good approximation.

In certain situations, you might find that starting fromscratch is a much quicker way to implement applicationcode.

Page 23: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Conclusions

Reasons for general-purpose processors make up themajority of the computational infrastructures :

(1) Huge numbers of systems based on these processors.

(2) Large supply of professionals skilled, this leads tolower skills costs.

(3) A lot of application development tooling.

(4) The relatively “easy” code porting to these platforms.

Page 24: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Conclusions

“ESOTERIC” technologies: Offer high performance for their chip area. Consume much less power per computation.

Disadvantages:(1) Skills to program them are rare and, hence, expensive.(2) Lack of application development tooling.(3) The “porting” process is generally both slow and costly.

Page 25: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Conclusions

Advantages of Cell/B.E. technology:

(1) Consumes less power, space and cooling(2) High computational power.(3) Better data movement and manipulation abilities.(4) A number of strong customer proof points.(5) Support from key Independent Software Vendors(6) Results of experiments such as this one.

Page 26: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Questions….

Comments….

Caveats ….


Recommended