+ All Categories
Home > Documents > P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization...

P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization...

Date post: 15-Jul-2018
Category:
Upload: doantuyen
View: 216 times
Download: 0 times
Share this document with a friend
24
POWER OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS PREPARED FOR: SHARON AHLERS ENGINEERING COMMUNICATIONS 350 COLLEGE OF ELECTRICAL AND COMPUTER ENGINEERING CORNELL UNIVERSITY PREPARED BY: ALEXANDER VITKALOV COLLEGE OF ELECTRICAL AND COMPUTER ENGINEERING CORNELL UNIVERSITY DECEMBER 12, 2005
Transcript
Page 1: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

POWER OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS

PREPARED FOR: SHARON AHLERS ENGINEERING COMMUNICATIONS 350 COLLEGE OF ELECTRICAL AND COMPUTER ENGINEERING CORNELL UNIVERSITY PREPARED BY: ALEXANDER VITKALOV COLLEGE OF ELECTRICAL AND COMPUTER ENGINEERING CORNELL UNIVERSITY DECEMBER 12, 2005

Page 2: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 2

ABSTRACT

This report evaluates the benefits of using heterogeneous processor cores as a means of

reducing microprocessor power consumption while increasing its performance. The project

focuses on the hardware implementation of heterogeneous processors rather than software.

Advantages of multicore architectures are evaluated across five main categories including

performance, efficiency, compatibility, functionality and cost. Increases in speed and

efficiency of multicore processors are derived through extrapolation of data from comparison

between single core processors and their dual core counterparts. Compatibility and

functionality advantages are discussed in terms of backwards compatibility, design flexibility

and power consumption. The report concludes with a feasibility study outlining the

technological and financial conditions required for profitable development of multicore

processors.

Page 3: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 3

TABLE OF CONTENTS LIST OF FIGURES ................................................................................................................ 4

1. INTRODUCTION........................................................................................................... 5

2. PERFORMANCE ........................................................................................................... 5 2.1 CHOICE OF PROCESSORS ...................................................................................... 6 2.2 OVERALL PERFORMANCE .................................................................................... 7 2.2 PERFORMANCE EXTRAPOLATION ......................................................................... 9

3. EFFICIENCY .............................................................................................................. 10 3.1 PERFORMANCE PER WATT................................................................................. 11 3.2 EFFECTS OF CORE HETEROGENEITY.................................................................. 12 3.3 CHALLENGES ..................................................................................................... 13

4. COMPATIBILITY........................................................................................................ 13 4.1 BACKWARDS COMPATIBILITY............................................................................ 13 4.2 CORE COMPATIBILITY....................................................................................... 14

5. FUNCTIONALITY ....................................................................................................... 15 5.1 PROGRAMMABLE PROCESSORS .......................................................................... 15 5.2 CHALLENGES ..................................................................................................... 17

6. FEASIBILITY.............................................................................................................. 17 6.1 CURRENT TECHNOLOGIES.................................................................................. 17 6.2 FUTURE TECHNOLOGIES .................................................................................... 18

7 CONCLUSION ............................................................................................................ 19

8 RECOMMENDATIONS ................................................................................................ 19

REFERENCES..................................................................................................................... 21

GLOSSARY ........................................................................................................................ 23

Page 4: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 4

LIST OF FIGURES

FIGURE 1: SELECTED PROCESSORS………………………………………………………6

FIGURE 2: PROCESSOR POWER CONSUMPTION……………………………………...…...7

FIGURE 3: OVERALL PROCESSOR PERFORMANCE [DHRYSTONE]……………….………..7

FIGURE 4: OVERALL PROCESSOR PERFORMANCE [WHETSTONE]………………………...8

FIGURE 5: OVERALL PROCESSOR PERFORMANCE EXTRAPOLATION…....……………….. 9

FIGURE 6: PERFORMANCE PER WATT COMPARISON…………… ……….…………….. 10

FIGURE 7: PERFORMANCE PER WATT EXTRAPOLATION ……………....………………..11

FIGURE 8: BACKWARDS COMPATIBILITY……………………………....………………..14

FIGURE 9: PROGRAMMABLE COMMUNICATIONS BUS AS CORE INTERCONNECT……….. 14

FIGURE 10: VIDEO DECODER SCENARIO……………………………....……………… ..16

FIGURE 11: RELATIVE CORE SIZING……………………………... …....……………….18

Page 5: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 5

1. INTRODUCTION

Over the past twenty years processor

frequency has always been considered to be

the fundamental measure of performance.

Higher frequency generally meant faster

performance. However, this notion has

changed as processor power consumption

became increasingly important. Power

consumption is dependent on operating

frequency and the number of transistors used

in a processor. Today’s processors use as

many as 250 million transistors, meaning that

a small increase in frequency of each can

cause a dramatic increase in overall power

consumption. The enormous heat that is

generated as a result causes thermal

breakdown of silicon crystals. Although

improving manufacturing process and

decreasing transistor sizes lowers the power

consumption this approach is becoming

increasingly costly. Therefore, we have

reached a point where the true processor

performance is no longer determined solely

by its frequency or transistor size but is

dependent on the elegance and efficiency of

its architecture.

Advances such as pipelining, branch

prediction and hyperthreading have enabled

the increase in performance and efficiency of

processors. However, even the most efficient

single core architectures cannot provide

effective solutions to the demands of

consumers. Unacceptable levels of power

consumption and increasing costs of

developing complex single core chips forced

the manufacturers to improve the efficiency

and performance of their processors through

the use of dual core solutions. Although

using two identical cores is a step in the right

direction, efficiency and performance of

microprocessors can be further improved by

using multiple heterogeneous cores. The

advantages provided by this method enable

the fusion of high-performance and mobile

processor architectures and will provide

effective solutions for the years to come.

To understand why utilizing

parallelism in processors by using multiple

heterogeneous cores is the most effective

method of improvement it is helpful to start

from a simple performance and efficiency

comparison of identical single and dual core

processors and then gradually advance to

more complicated issues.

2. PERFORMANCE

Performance of a processor truly

depends on a multitude of factors determined

by its architecture and manufacturing

process. In general, frequency has always

emerged to be the leading factor. However

architectural features such as cache size,

Page 6: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 6

efficiency of a branch predictor and pipeline

depth among others are becoming

increasingly important in determining the

performance of a single core processor.

Intuitively, the effects of designing a superior

architecture are magnified when more than

one core is used.

2.1 CHOICE OF PROCESSORS To demonstrate the importance of

processor architecture, the performance of

several cores from different applications need

to be compared to each other (Figure 1). In

high performance segment, Intel Pentium 4

670 [1] and AMD Athlon FX-55 [2] were

chosen, since they represent the fastest single

core processors available today. Intel

Pentium 4 840D [3] and AMD Athlon 64-X2

[4] are their dual core counterparts. In

addition, Intel Pentium M 780 [5] and

Transmeta Efficeon 8800 [6] were selected

from two opposite ends of mobile spectrum.

The disparity in their operating frequency

and power consumption illustrates the

flexibility that is required for a truly mobile

solution. Intel PXA270 [7] is the sole

example of a true system on chip (SoC)

processor that is typically used in personal

digital assistants.

From Figure 2, which illustrates

processor power consumption, it can be seen

that the energy use in modern desktop

processors, based on Pentium IV or AMD

Athlon nearly quadruples that of a typical

laptop, based on Pentium M. One of the

objectives of this report is to investigate how

this figure can be reduced through an

efficient combination of multiple

heterogeneous cores.

FIGURE 1: SELECTED PROCESSORS

PROCESSOR FREQUENCY TRANSITORS SIZE PRICE

PERFORMANCE

Intel Pentium IV 670 3800Mhz 169M 112mm2 $625

AMD Athlon FX-55 2600Mhz 114M 115mm2 $824

MOBILE

Intel Pentium M 780 2260Mhz 140M 87mm2 $638

Transmetta Efficeon 1300Mhz 40M 29mm2 n/a

Intel PXA 270 624Mhz 2.5M 50mm2 n/a

DUAL CORE

Intel Pentium IV 840D 3200Mhz 230M 237mm2 $667

AMD Athlon X2 4800+ 2400Mhz 233M 199mm2 $790

Page 7: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 7

FIGURE 2: PROCESSOR POWER CONSUMPTION

0 20 40 60 80 100 120 140

AMD Athlon X2 4800+

Intel Pentium IV 840D

Intel PXA 270

Transmetta Efficion

Intel Pentium M 780

AMD Athlon FX-55

Intel Pentium IV 670

POWER CONSUMPTION (WATT)

Clearly, the highest power consumers

are the performance based cores of Intel

Pentium 670 and 840D along with dual core

AMD Athlon FX-55 and X2. The overall

power consumption of dual core solutions is

greater. However, the consumption per-core

of dual core solution nearly halves the one of

a single core. Therefore, if each core

provides equal performance to its single core

counterpart at half the power, it is twice as

efficient. The subsequent sections, focused

on processor performance, verify the degree

of validity of the above statement.

2.2 OVERALL PERFORMANCE The overall performance of the

selected processors can be compared using

FIGURE 3: OVERALL PROCESSOR PERFORMANCE: SISOFT SANDRA (DHRYSTONE)

0 5000 10000 15000 20000 25000

AMD Athlon X2 4800+

Intel Pentium IV 840D

Intel PXA 270

Transmetta Efficeon

Intel Pentium M 780

AMD Athlon FX-55

Intel Pentium IV 670

POWER CONSUMPTION (WATT)

Page 8: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 8

various benchmarks, specifically SiSoft

Sandra 2004 Dhrystone and Whetstone [8,9].

Dhrystone benchmark compares the speed of

processors by counting the number of largely

numerical operations or MIPS (Millions of

Instruction Per Second), associated with

common application instructions such as the

ones received from Windows (Figure 3).

Whetstone evaluates the floating point

performance of a processor MFLOPS

(Million Floating Operations per Second),

typically associated with scientific or

multimedia applications. The second aspect

of the benchmark is becoming increasingly

important as computers are being used to

watch videos, listen to music and play 3D

games (Figure 4).

In the overall performance

comparison, dual core processors remain to

be the definite favorites, especially in the raw

performance benchmark such as SiSoft

Sandra Dhrystone. For both Intel and AMD,

the dual core nearly doubles the performance.

The Whetstone also shows similar picture.

Although the performance of the high-end

desktop chips is superior to the mobile

architectures, the difference is marginal. The

Dhrystone performance of Intel Pentium M,

which is a mobile processor, is only 15% less

than that of a single core Intel Pentium 670

and AMD Athlon FX-55. In multimedia

benchmark, SiSoft Whetstone, Intel Pentium

M also rounded up well against the desktop

counter parts.

On the other hand, Transmetta

Efficeon and handheld Intel PXA-270 show a

dramatic disadvantage in to the overall

performance figures. Practically non-existent

FIGURE 4: OVERALL PROCESSOR PERFORMANCE: SISOFT SANDRA (WHETSTONE)

0 2000 4000 6000 8000 10000 12000

AMD Athlon X2 4800+

Intel Pentium IV 840D

Intel PXA 270

Transmetta Efficeon

Intel Pentium M 780

AMD Athlon FX-55

Intel Pentium IV 670

MILLION FLOATING OPERATIONS PER SECOND

Page 9: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 9

Dhrystone and Whetstone benchmarks of

Intel PXA-270, which is one of the top

handheld processors used today, shows a

definite underperformance when it comes to

ultra portable solutions. This significant

difference in performance also explains why

PDAs are not as widespread as desktops and

laptops. Slow processors seriously limit the

functionality of the units, giving consumers

less incentive to buy them. The performance

of the ultra portable processors such as Intel

PXA-270 is capped by the stringent limits in

power supply that are required to keep the

units portable. Although battery capacities

are slowly increasing, other methods, such as

use of multiple cores, are required to make

ultra portable electronics more practical.

2.2 PERFORMANCE EXTRAPOLATION During Intel Developer’s Forum in

the Spring 2005, Intel corporation predicted a

ten-fold increase in processor performance

due to the introduction of multicores.

Considering that the performance of desktop

processors increased by 68 times since the

introduction of 8086 in 1978 this prediction

is quite accurate in the long run. As Intel

readies dual and quad core processors to hit

the markets in 2006-07 the probability of a

significant short term performance increase is

also likely. Considering that these processors

will be manufactured on a decreased .065μm

process, the power and performance

FIGURE 5. PERFORMANCE EXTRAPOLATION (BASED ON SANDRA DHRYSTONE BENCHMARK)

0% 50% 100%

150%

200%

250%

300%

350%

400%

450%

Intel Pentium M (Quad)

Intel Pentium M (Dual)

Intel Pentium M 780

AMD Athlon (Quad)

AMD Athlon X2 (Dual)

AMD Athlon FX-55

Intel Pentium (Quad)

Intel Pentium IV 840D

Intel Pentium IV 670

PERFORMANCE %

Page 10: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 10

advantages will also be extended. By the

most conservative estimates the quad core

configurations should improve the overall

processor performance by a factor of at least

3 (Figure 5).

If the dual and quad core versions are

used with a portable processor, such as Intel

Pentium M, laptops will experience an even

more significant increase in performance.

Laptop processor designs are more energy

efficient and therefore produce less heat.

Considering that power dissipation issues are

shifting to the forefront of technological

limitations of processor designs, mobile

architectures are likely to benefit more from

the increased core count. If fact, Intel is

already making plans to introduce dual core

mobile processors based on Pentium M, with

a current codename Yohan. In addition, a

quad core desktop processor, codenamed

Kentsfield, is planned for the mid 2007

arrival [10].

3. EFFICIENCY Although performance has always

been the most important measure of

evaluating processor superiority, issues with

energy consumption are rapidly adding new

meanings to this concept. Currently,

manufacturers like Intel and AMD are

beginning to evaluate their products on the

basis of performance per watt, which reflects

not only the speed of the processor, but also

how efficient it is in terms of energy use.

FIGURE 6: PERFORMANCE PER WATT COMPARISON.

0 50 100 150 200 250 300 350 400

AMD Athlon X2 4800+

Intel Pentium IV 840D

Intel PXA 270

Transmetta Efficeon

Intel Pentium M 780

AMD Athlon FX-55

Intel Pentium IV 670

PERFORMANCE(MIPS PER WATT)

Page 11: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 11

3.1 PERFORMANCE PER WATT Since the introduction of performance

per watt design approach the industry has

been achieving higher performance figures at

lower power consumption. Although mobile

processors have always been better with

respect to efficiency, the answer for the

desktop systems has often been dual core

architectures. Figure 6, derived from figures

2 and 3, compares the performance per watt

ratings of the selected processors. For the

purpose of brevity, only Dhrystone

benchmark was used for the overall

performance component of this comparison.

Intel PXA-270 was not included due to

significantly different CPU architecture. Due

to the fact that its architecture is optimized

for extremely low power consumption, the

performance per watt figure varies

significantly from one application to the

next. It is important to note that Pentium M

has a significantly higher performance per

watt ratio than any other process in

comparison. Another important trend is that

the dual cores improve the performance per

watt ratio by roughly 30%. Combining these

results with Figure 5 we reach a clear

conclusion that mobile cores benefit the most

from having multiple cores (Figure 7). To

obtain the results, the performance

percentages of figure 5 were divided by the

performance per watt ratio of a given

processor relative to Pentium M. For

instance, for AMD Athlon FX-55 the

performance per watt ratio is 350/140 = 2.5.

FIGURE 7: PERFORMANCE PER WATT EXTRAPOLATION

0% 50% 100%

150%

200%

250%

300%

350%

400%

450%

Intel Pentium M (Quad)

Intel Pentium M (Dual)

Intel Pentium M 780

AMD Athlon (Quad)

AMD Athlon X2 (Dual)

AMD Athlon FX-55

Intel Pentium (Quad)

Intel Pentium IV 840D

Intel Pentium IV 670

PERFORMANCE % (Relative to Pentium M)

Page 12: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 12

Its performance based on Sandra Dhrystone

is 119% relative to Pentium M. Therefore,

the total performance per watt is only 47.6%

relative to Pentium M. Since performance

and power consumption data was unavailable

for the quad cores, it was derived by

obtaining performance per watt ratio of a

corresponding single and dual core solution.

To obtain the quad core solution, this factor

was then multiplied by the performance per

watt percent of the dual core solution.

Clearly, data shows that in terms of

efficiency, mobile solutions are hard to beat.

3.2 EFFECTS OF CORE

HETEROGENEITY

By using heterogeneous cores we can

further increase both performance and

efficiency of a processor [11,12]. In a likely

scenario, several cores with different

performance, efficiency and complexity

indexes can be combined in one multicore

design. Large variations in the core

architectures causes the entire design to

become more flexible and adaptive to a

specific application. Kumar, et. al. in their

study found that on average heterogeneous

cores provide a significant 39% in power

reduction while having only a negligible 3%

reduction in performance. Since their

experiment was based on the Alpha

processor, which is rarely used in consumer

products, it can only suggest the potential in

improvement that can be made if processor

architectures were specifically designed to

take advantage of multicore heterogeneity. In

this case, powerful cores can be combined

with more efficient ones to generate a

significant improvement in terms of

performance per watt figures. Since modern

processors remain underutilized for most of

the time, this approach would yield

significant idle power reductions. More

powerful cores would simply be shut off and

used only when their performance counts.

For the desktops the reduced power load

would decrease the demand for the custom

water cooled solutions that are beginning to

appear to resolve heat dissipation problems.

In addition, the increased processing

capabilities would greatly reduce bottlenecks

in calculation intensive applications, such as

file archiving and conversion.

In ultra mobile applications, the speed

of a desktop processor is rarely necessary,

however if need does arise the batteries may

provide enough power for shorts spurs of

time through the use of capacitors. In

addition, less complex cores can be use used

for the specific application further reducing

the power consumption, which will be

discussed later.

Page 13: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 13

3.3 CHALLENGES

One of the primary challenges of

introducing heterogeneous cores is the

increasing complexity of the communication

bus that is required for such a complicated

network. When it comes to the modern dual

core processors, the communication between

the cores is still in its infant stages of

development. Although these processors

provide nearly a two fold increase in some

applications, they may provide none in

others. Creating an effective communications

bus is a difficult challenge, which can be

magnified by introduction of heterogeneous

cores that may work on different or even

variable frequencies. In addition,

performance bottlenecks and power

distribution issues have to be evaluated

largely on the hardware level. Although,

Kumar et.al [11] used software to determine

the processor assignment for a specific

instruction, hardware implementations of an

effective algorithm would have significant

performance advantages. A separate co-

processing unit may be necessary just to deal

with power and performance optimization.

4. COMPATIBILITY

One of the most important factors in

the success of experienced by Intel

Corporation over the years is backwards

compatibility. Backwards compatibility

means that newer processors produced by

Intel are compatible to the ones twenty

years ago. The purpose of this practice is

so the software does not have to be

rewritten for every new generation of

processors. As a result, many of the

newest processors have a number of old

artifacts that serve no purpose in any of

the modern applications. This section of

the report focuses not only on describing

the method of ensuring backwards

compatibility in multicore processors but

also on making sure that heterogeneous

cores are compatible with each other.

4.1 BACKWARDS COMPATIBILITY

Multicore processors offer the best

possible solution in terms of backwards

compatibility. Compared to the modern

counterparts, the processors from twenty

years ago were much slower. Therefore,

highly efficient processor cores, working at

low frequencies, are more than sufficient to

emulate the operation of their ancestors

(Figure 8). The backwards compatibility, in

faster high performance cores can therefore

be neglected. As a result, the unnecessary

redundancy would be eliminated from the

entire design. In addition, the performance

Page 14: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 14

and efficiency should increase because

common instruction set potentially excludes

practically useless operations.

Since heterogeneous multicore

processors may use significantly different

cores to diversify their performance index

across a range of applications, efficient

operation has to be ensured through core

compatibility [13].

4.2 CORE COMPATIBILITY

FIGURE 8: BACKWARDS COMPATIBILITY

MOBILE CORE

PERFORMANCE CORE INSTRUCTION TYPE A

INSTRUCTION TYPE A

INSTRUCTION TYPE B

Most modern designs use a reduced

in instruction set count (RISC) architectures.

This approach provides advantages in both

performance and power consumption over

other processor instruction types. On the

other hand, the instruction sets vary from

processor to processor. For instance, Pentium

IV includes an additional set of multimedia

instructions to increase its performance

across a wide range of multimedia

FIGURE 9: PROGRAMMABLE COMMUNICATION BUS AS CORE INTERCONNECT.

Page 15: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 15

applications, while earlier processors such as

80486 do not. The situation becomes more

complicated as processors with different

RISC are combined together. For instance,

the RISC in Pentium IV is optimized for the

high performance applications. On the other

hand, the RISC in the Arm processor,

commonly used in handheld applications, is

optimized for power consumption [14]. As a

result, the instruction sets are not compatible

even though they are quite similar in their

purpose.

The most efficient solution is to

combine the heterogeneous cores using

translation layer [15]. Nava et. al proposed an

approach resembling a network topology to

resolve the communication issues in between

the heterogeneous cores. Having network

based communication bus act as a translation

layer between the heterogeneous cores will

eliminate most of the compatibility issues

between the cores. In addition, since the

proposed communication bus can be

programmable. The power consumption can

therefore be further decreased by using smart

routing techniques optimized to increase the

performance per watt rating of the processor.

Programmable components, such as

communication bus connecting

heterogeneous cores, can have a significant

impact on the increased performance

compared to the approach used by Kumar et.

al [11]. Since the communication bus will be

programmed by a local coprocessor, rather

by indirect software methods used in [11],

the efficiency and performance of the overall

design is likely to have a significant increase.

5. FUNCTIONALITY Since heterogeneous multicore

processors are likely to include a number of

programmable components besides a

communications bus, the functionality of the

design is likely to increase significantly. The

purpose of this section is to discuss the

advantages in functionality that are

associated with multicore processors and

their programmable components.

5.1 PROGRAMMABLE PROCESSORS

The key element of a heterogeneous

multicore design of tomorrow will be an

increased number of programmable

processors. The advantages of programmable

processors include custom execution units,

variable instruction sets as wells as registers

and register files [16]. Conventional fixed

instruction set processors simply cannot

compete with flexibility and performance

advantages offered by the programmable

processors when it comes to specific

applications. For instance, the emergence of

Page 16: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 16

digital signal processing in internet and

multimedia applications allowed system

architects to design processors for their

specific algorithms and subsequently update

them as their algorithms improve over time.

For instance, a configurable processor used

to decode older MPEG 2 video files, can be

reprogrammed to decode newer MPEG-4

videos.

Today’s system on chips design

contains hundreds of custom programmable

processors [16]. This is achieved by keeping

the complexity of the programmable

processor cores at relatively low levels. In

future, the programmable cores may become

more complex and have wide range of

functionality. For instance, the same

programmable processor can a physics

processor for scientific and entertainment

applications or act as a GPS processor for

navigation applications.

Considering that modern processors

are swaying away from the traditional

handcrafted design approach, the complexity

of the processors is likely to increase. In the

long run, the goal is to enable computers to

design themselves with as few human inputs

as possible. Today, computers go only as far

as aiding the designer in optimizing a given

processor architecture. Special software

FIGURE 10: VIDEO DECODER SCENARIO USING HETEROGENEOUS MULTICORE PROCESSOR

COREA

PROGCORE

COREB

ON

OFF

PROGCORE

PERFORMANCE MODE MOBILE MODE

ON

OFF

COREA

PROGCORE

COREB

PROGCORE

PROGRAMMABLE COMMUNICATIONS BUS PROGRAMMABLE COMMUNICATIONS BUS

OPERATING SYSTEM OPERATING SYSTEM [POWER SAVE]

VIDEO APPLICATION (RESOLUTION 1280X720 @30FPS)

VIDEO APPLICATION (RESOLUTION 800X480 @ 25 FPS)

Page 17: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 17

packages can be used to reprogram

processors, while optimizing its performance

and power consumption to a given scenario.

It is likely, that one of the first applications

of this approach will be configurable

processors that would reprogram themselves

based on the amount of available power. For

instance, a video stream decoder could

provide an HDTV quality resolution while a

laptop is plugged into a wall outlet, while

giving a lower resolution and saving power

when the user is traveling (Figure 10).

5.2 CHALLENGES

The primary challenge as the number

and complexity of programmable

components increases is to ensure that

heterogeneous cores are communicating

efficiently [17]. The efficient operation

means that the power consumption has to be

minimized according to the performance

demand. An increasing variability in

instruction sets makes this a challenging task.

As instructions become significantly

different from each other, it is harder to

determine the best suited component. 64 bit

instructions, already present in some CPU’s

may provide the answer to this problem since

they accommodate twice the amount of

information compared to traditional 32 bit

instruction.

In addition, as the complexity of the

heterogeneous components is increasing so

will the complexity of the tools that are

required to design them. A very high level of

abstraction is required to create a system with

so many variable parameters. In addition, a

point may be reached when the complexity of

the instructions and design will start adding a

burden on the overall system performance.

6. FEASIBILITY The purpose of the feasibility study in

this report is to evaluate technological and

market conditions required for making

heterogeneous core processors a viable

alternative to current technologies. The study

evaluates current technological conditions

and investigates near future trends. 6.1 CURRENT TECHNOLOGIES For the past four years the

manufacturing standard was the .09μ

technology. Along with improved processor

architectures, it allowed the increase in

processor frequencies from roughly 1.4 to 3.8

Ghz. This seemingly disproportional

frequency increase had a serous negative

impact on power requirements of a typical

processor. Since larger number of transistors

was needed to achieve high frequency

Page 18: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 18

designs and each transistor required higher

operating voltage for faster switching, the

overall power increased quadratically.

Modern designs have reached a point where

performance is strictly limited by the ability

of the processor to withstand the enormous

heat it generates. In addition, the increased

complexity of the designs is making modern

processors have lower production yields,

which drive the unit prices up. In some

cases, such as Intel Pentium Extreme Edition,

the unit prices have reached a staggering

1000 dollars.

To resolve the problem with power

dissipation and low yields manufacturers

resorted to the use of dual cores and 64 bit

processors. By increasing parallelism in the

architectures, the processor designers were

able to decrease the frequency and power

consumption of each core while still

increasing the performance. Less

complicated cores that are used in dual core

solutions have higher yields, decreasing costs

of the overall design. 6.2 FUTURE TECHNOLOGIES For the next several years the trend of

exploiting parallelism in processor designs

will gain momentum. Improved

manufacturing processes will continue to

drive down the costs of developing dual core

processors. However, in order to make a

significant leap into the future the core

manufacturing technology needs to decrease

from .09μm down to 0.065μm. This 38%

decrease in length would cause nearly a 50%

decrease in total area (Figure 11).

Considering that manufacturers are also

shifting towards using larger wafers1, the

decrease in production costs per unit will at

least half.

FIGURE 11. RELATIVE CORE SIZING

CORE

CORE

CORECACHE

CORE

CORE

CACHE

SINGLE CORE 0.09um QUAD CORE 0.065um

PO

WE

R

PER

FOR

MAN

CE

PO

WE

R

PER

FOR

MAN

CE

The decreased core area will also

allow more cores to be combined into one

processor. The current .09μ technology does

not allow practical integration of more than

two cores, since cooling large cores becomes

1 Wafers are used to grow silicon crystals, which are subsequently sliced and divided up into processor cores.

Page 19: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 19

problematic. A larger area can also

potentially allocate more space to the power

optimization circuitry. This is one of the

reasons why 0.065μm technology would

allow for development of quad-core and

multicore processors.

Unfortunately, many fundamental

physical phenomena, such as leakage

currents, are imposing serious constraints to

sub .065 μm technologies. Although Intel is

currently looking at a potential reduction

down to .045μm within the next two years

[10], this projection may be as realistic as

5Ghz Pentiums that were rumored by 2005

and never delivered. Sub .045μ technologies

will eventually become a reality once

extreme forms of lithography are

implemented. At this point heterogeneous

multicore processors will likely become the

dominant trend in the markets due to a

number of advantages discussed in prior in

this report.

7 CONCLUSION

Although true high performance

heterogeneous multicore processors are still

guarded by unresolved technological

limitations, the future of this technology

seems promising. The advantages introduced

by multicore designs overshadow the

benefits of current single and dual core

solutions. In terms of performance and

efficiency, heterogeneous multicore

processors offer unprecedented increases due

to power consumption flexibility and high

level of configurability. Although this report

at times focused solely on performance

advantages of multicore architectures, one

has to keep in mind that in future the terms

performance and efficiency will become

interchangeable. Due to the increased

parallelism, the performance of future

multicore processors will be limited by

mostly the amount of power supplied and not

the frequency at which it operates. Therefore,

the foremost issue with heterogeneous

multicore processor design is optimizing

power consumption by carefully selecting the

cores and engaging programmable

components based on the demands of

applications. 8 RECOMMENDATIONS

Based on their performance and

efficiency multicore processors are clearly

the future of computing. Advantages in

functionality and power consumption

associated with multicore designs will

increase the amount of possible applications

while adding new ways we will use

computers in our lives. For instance,

heterogeneous multicores and breakthroughs

Page 20: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 20

in memory technologies such as

perpendicular recording and magnetic

memory, will allow the current desktops to

be squeezed down do the size of a cellphone.

Personalized programmable processors in

these phones may enable the use of

cellphones as credit cards, which will be

orders of magnitude more secure compared

to traditional methods. Ultimately,

heterogeneous cores will cause an

increasingly interactive experience from all

electronics across the board.

This is one of the multitude of

examples reflecting the importance of

technologies which accelerate the

development of heterogeneous multicore

processors. Although alternative research

directions, such as quantum computing,

promise lucrative opportunities, they do not

have solid theoretical and practical

foundations. Advances in multicore

processor architectures are based on decades

of research in the field of silicon based

semiconductors and not on a small number of

theoretical speculations. This is why

increasing research and financing in the field

of heterogeneous multicore processors is an

undeniably solid investment that will bring

significant returns in the long run.

Page 21: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 21

REFERENCES

[1] Intel Pentium 4 Processor 670 Processor Datasheet. Intel Corporation [Online] Available from: www.intel.com [2] AMD Athlon 64 4800+ Processor: Datasheet. AMD Corporation [Online] Available from: www.amd.com [3] Intel Pentium D Processor 840, 830, and 820: Datasheet. Intel Corporation. [Online]

Available from: www.intel.com [4] AMD Athlon 64 X2 Processor: Datasheet. AMD Corporation [Online] Available from: www.amd.com [5] Intel Pentium M 770 Processor: Datasheet. Intel Corporation [Online] Available from: www.intel.com [6] Transmeta Efficeon TM 8800 Processor. Transmeta Corporation [Online] Available from www.transmeta.com [7] Intel PXA270 Processor: Datasheet. Intel Corporation [Online] Available from: www.intel.com [8] SiSoft – The Diagnostic Tool. SiSoft Corporation [Online] Available from: http://www.sisoftware.net/index.html?dir=&location=qa&langx=en&a= [9] Tom’s Hardware Guide Processors. Tom’s Guide Publishing, 2005. [Online] Available from: http://www23.tomshardware.com/index.html [10] SCHMID, P. Top Secret Intel Processor Plans Uncovered. Tom’s Guide Publishings.

[Online]. Available from: http://www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered/

index.html

Page 22: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 22

[11] KUMAR R, FARKAS K, JOUPPI N, RANGANATHAN P, TULLSEN D. Single-ISA Heterogeneous Muti-Core Architectures: The Potential for Processor Power Reduction. Proceedings of the 36th International Symposium on Microarchitecture (MICRO-36’03). IEEE. 2003.

[12] BALAKRISHNAN S, RAJWAR R, UPTON M, LAI K. The Impact of Performance Asymmetry in Emerging Multicore Architectures. Proceedings of the 32nd International Symposium on Computer Architecture (ISCA’05). IEEE. 2005.

[13] JERRAYA A, TENHUNEN H, WOLF W. Introduction to Microprocessor Systems On Chips.

Computer, v 38, n 7, July 2005. pp. 36-40. [14] Goodacre J, Sloss A. Parallelism and the ARM Instruction Set Architecture. Computer, v

38, n 7, July 2005. pp. 42-50 [15] NAVA M.D, BLOUET P, TENINGE P, COPPOLA M, BEN-ISMAIL T, PICCHIOTTINO S, WILSON

R. An Open Platform for Developing Multiprocessor SOCs. Computer, v 38, n 7, July 2005. pp. 60-67

[16] LEIBSON S, KIM J. Configurable Processors: A New Era in Chip Design. Computer, v 38, n 7, July 2005. pp. 51-59.

[17] JERRAYA A, BAGHDADI A, CESARIO W, GAUTHIER L, LYONNARD D, NICOLESCU G, PAVIOT Y, YOO S. Application Specific Multiprocessor Systems-on-Chip. SASIMI.

Page 23: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 23

GLOSSARY

CORE – Silicon device that contains transistor logic for the Processor. CPU – Central Processing Unit, or a Processor. DHRYSTONE – Benchmark used to measure floating point (MFLOPS)

performance of a processor GPS- Global Positioning System. HETEROGENEOUS – made of processor core of different architectures. KENTSFIELD- First quad-core desktop processor due to appear in 2007. MIPS – Million Instructions Per Second MFLOPS – Million Floating Operations per Second PDA- Personal Digital Assistant PROCESSOR – Component that is responsible for evaluation of instructions. RISC - Reduced Instruction Set Coun.This approach is used in most

modern microprocessors. It allows for faster and more efficient hardware designs.

WATT- Unit of power, or work per second. WHETSTONE- Benchmark used to measure integer (MIPS) performance of a

processor YOHAN- Dual core processor based on .065 process due to replace current

Pentium M.

Page 24: P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization methods in heterogeneous multicore processors prepared for: sharon ahlers engineering

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 24


Recommended