+ All Categories
Home > Documents > Multicore Processor Report

Multicore Processor Report

Date post: 08-Apr-2018
Category:
Upload: dilesh-kumar
View: 224 times
Download: 0 times
Share this document with a friend

of 19

Transcript
  • 8/7/2019 Multicore Processor Report

    1/19

    [1]

    1.INTRODUCTIONA processor is a unit that reads, decodes and executes the program instructions. The

    processors were originally developed with only one core. A core is a part of processor that

    actually performs1. Fetching

    2. Decoding

    3. Executing an instruction as shown in Fig 1.

    Fig. 1.1( Single Core Computer)

    Single core processor is a processing system; is an Integrated Circuit (IC) which allows

    two or more individual and independent cores have been attached, on a single die. Placing

    two or more powerful computing cores on a single processor opens up a world of

    important new possibility. Each core has its own complete set of resources and may share

    on-die cache layers.

    A single core processor can process only one instruction at a time. To improve the

    efficiency, processor commonly utilizes pipelines internally, which allow several

    instructions to be processed together. However they are still consumed into the pipeline at

    a time.

  • 8/7/2019 Multicore Processor Report

    2/19

    [2]

    Fig.1.2 (Single Core)

    So it laid to evolution of Multi core processor; placing two or more powerful computing

    cores on a single processor opens up a world of important new possibility to increase the

    performance of the system.

    Need of multi core processors:The difficulties in using a single core CPU gave birth to using the Multi core

    processors.

    1. Difficult to make Single core clock frequency even higher in cost .

    Fig 1.3.(Clock Frequency and Performance)

    The difficulty in raising clock frequency further results in improvement of performance

    but decreases reliability.

    Doubling the frequency causes fourfold increase in power consumption. Calculated as

    Power = Capacitance * Voltage * Frequency.

    2. Many New applications are Multi Threaded

  • 8/7/2019 Multicore Processor Report

    3/19

    [3]

    3. General trend in Computer Architecture now-a-days shift towards Parallelism .Deeply pipelined circuits which would lead to

    a. Heat Problemsb. Speed of Light Problemsc.

    Large Design Teams necessary

    d. Server Farm Need Expensive Air ConditioningTo overcome the above drawbacks of single core processor and to increase the

    performance of the system without increasing the power consumptions and with less

    complexity the need of Multi core processor raised.

  • 8/7/2019 Multicore Processor Report

    4/19

    [4]

    2.PROCESSOR HISTORY

    Intel manufactured the first microprocessor, the 4-bit 4004, in the early 1970s

    which was basically just a number-crunching machine. Shortly afterwards they

    developed the 8008 and 8080, both 8-bit, and Motorola followed suit with their 6800

    which was equivalent to Intels 8080. The companies then fabricated 16-bit

    microprocessors, Motorola had their 68000 and Intel the 8086 and 8088; the former would

    be the basis for Intels 80386 32-bit and later their popular Pentium lineup which were in

    the first consumer-based PCs. [18, 19] Each generation of processors grew smaller, faster,

    dissipated more heat, and consumed more power.

    Fig 2.1(generation of processors)

  • 8/7/2019 Multicore Processor Report

    5/19

    [5]

    2.1 MOOREs LAW

    One of the guiding principles of computer architecture isknown as Moores Law. In

    1965 Gordon Moore stated that the number of transistors on a chip will roughly double

    each year.What is often quoted as Moores Law is Dave Houses revision that computer

    performance will double every 18months. [20] The graph in Figure 2.2 plots many of the

    early microprocessors against the number of transistors per chip.

    Fig 2.2(microprocessors against the number of transistors per chip)

    Throughout the 1990s and the earlier part of this decade microprocessor frequency

    was synonymous with performance; higher frequency meant a faster, more capable

    computer. Since processor frequency has reached a plateau, we must now consider other

    aspects of the overall performance of a system: power consumption, temperature

    dissipation, frequency, and number of cores. Multicore processors are often run at slower

    frequencies, but have much better performance than a single-core processor because two

    heads are better than one.

  • 8/7/2019 Multicore Processor Report

    6/19

    [6]

    3. ARCHITECTUREThe processor or cores are implemented on a single die or chip .Each core has its

    own complete set of resources, and may share the on-die cache layers.

    Fig 3.1(architecture of multicore processor)

    Core:

    The individual processors that are implemented on the integrated die or chip are

    called the cores. The core is the part of the processor that actually performs the reading

    and executing of instructions.

    Register file:A register file is an array of processor registers in a central processing unit (CPU).

    Modern integrated circuit-based register files are usually implemented by way of fast

    static RAMs with multiple ports. Such RAMs are distinguished by having dedicated read

    and write ports, whereas ordinary multiported SRAMs will usually read and write through

    the same ports.

    Bus:

    Back side bus connects the processor with cache memory.

    Cache:

    Closest to the processor is Level 1 (L1) cache; this is very fast memory used to store data

    frequently used by the processor. Level2 (L2) cache is just off-chip, slower than L1 cache,

    but still much faster than main memory; L2 cache is larger than L1 cache and used for the

    same purpose.The cores not necessarily share the cache.

  • 8/7/2019 Multicore Processor Report

    7/19

    [7]

    Cross bar:

    Cross bar switch is a switch connecting multiple inputs to multiple outputs in a matrix

    manner. Here the cross bar switch is used to connect the system request queue (SRQ) and

    the integrated memory controller. Here it directly connects both CPU cores to theHyperTransport link, as well as the integrated memory controller, for I/O to and from the

    outside world. Think of it like a train-track switch - signals can pass to/from either core

    and the outside world, but not at the same time.

    Hyper transport link:

    Hyper-transport technology is a technology for interconnection of computer

    processors. It is a bidirectional serial/parallel high-bandwidth, low-latency point-to-point

    link. This is replacement for front side bus.

    Fig 3.2[multicore processor implemented with 4 independent processor]

    Integrated memory controller:

    The memory controller is a digital circuit which manages the flow of data going toand from the main memory.

    System request queue:

    The System Request Queue provides an interface for the CPU cores to the crossbar, and it

    is what keeps things operating smoothly. The System Request Queue manages and

  • 8/7/2019 Multicore Processor Report

    8/19

    [8]

    prioritizes both CPU cores' access to the crossbar switch, minimizing contention for the

    system bus. The result is a very efficient use of system resources.

    3.1CORE COMPONENTS

    .Pipeline:

    One widely accepted technique for improving the performance of serial software

    tasks is pipelining. Simply put, pipelining is the process of dividing a serial task into

    concrete stages that can be executed in assembly-line fashion. In order to gain the most

    performance increase possible from pipelining, individual stages must be carefully

    balanced so that no single stage takes a much longer time to complete than other

    stages.Deeper pipeline buys frequency at expense of increased cache miss penalty and

    lower instructions per clock. Shallow pipeline gives better instructions per clock at the

    expense of frequency scaling. Max. frequency per core requires deeper pipelines

    Cache:

    With the rising gap between processor and memory speed, maximizing on-chip

    cache capacity is crucial to attaining good performance. Memory system designers employ

    hierarchies of caches to manage latency. Many of todays multicore processors assume

    private L1 caches and a shared L2 cache. At some point, however, a single shared L2

    cache will require additional levels in the hierarchy. One option designers can consider is

    implementing a physical hierarchy that consists of multiple clusters, where each clusterconsists of a group of processor cores that share an L2 cache. The effectiveness of such a

    physical hierarchy, however, may depend on how well the applications map to the

    hierarchy. Cache size buys performance at expense of die size. Deep pipeline cache miss

    penalties are reduced by larger caches.

  • 8/7/2019 Multicore Processor Report

    9/19

    [9]

    4.SIMULTANEOUSMULTITHREADING WITH

    MULTICORE ARCHITECTURE

    Simultaneous multithreading, often abbreviated as SMT, is a technique forimproving the overall efficiency of superscalar CPUs with hardware multithreading. SMT

    permits multiple independent threads of execution to better utilize the resources provided

    by modern processor architectures.

    In simultaneous multithreading, instructions from more than one thread can be

    executing in any given pipeline stage at a time. This is done without great changes to the

    basic processor architecture: the main additions needed are the ability to fetch instructions

    from multiple threads in a cycle, and a larger register file to hold data from multiple

    threads. The number of concurrent threads can be decided by the chip designers, but

    practical restrictions on chip complexity have limited the number to two for most SMTimplementations.

    Without the implementation of simultaneous multithreading only a single thread

    can run at a time, whether it deals with different functional unit i.e integer unit or floating

    point unit. Using simultaneous multithreading the two functional units can execute two

    different threads concurrently. More important, however, two programs could now run

    simultaneously on a processor without having to be swapped in and out. To induce the

    operating system to recognize one processor as two possible execution pipelines, the new

    chips were made to appear as two logical processors to the operating system.

    Fig 4.1 [early processor : fig 4.2[processor with simultaneous

    One chip,one core multithreading: one chip,one

    One executing thread ] core ,two executing thread ]

  • 8/7/2019 Multicore Processor Report

    10/19

    [10]

    The performance of simultaneous multithreading was limited by the availability of shared

    resources to the two executing threads. As a result, SMT Technology cannot approach the

    processing throughput of two distinct processors because of the contention for these

    shared resources. To achieve greater performance gains on a single chip, a processorwould require two or more separate cores, such that each thread would have its own

    complete set of execution resources.

    Fig 4.3[multicore : one chip , many cores, several threads of execution ]

    In this design, each core has its own execution pipeline. And each core has the resources

    required to run without blocking resources needed by the other software threads. While

    the example in Figure shows a four-core design, there is no inherent limitation in the

    number of cores that can be placed on a single chip. The multi-core design enables two or

    more cores to run at somewhat slower speeds and at much lower temperatures. The

    combined throughput of these cores delivers processing power greater than the maximum

    available today on single-core processors and at a much lower level of power

    consumption.

  • 8/7/2019 Multicore Processor Report

    11/19

    [11]

    5.PERFORMANCE ANALYSIS

    A multicore arrangement that provides two or more low-clock speed cores could be

    designed to provide excellent performance while minimizing power consumption and

    delivering lower heat output than configurations that rely on a single high-clock-speedcore.

    The following example shows how multicore technology could manifest in a

    standard server configuration and how multiple low-clock-speed cores could deliver

    greater performance than a single high-clock-speed core for networked applications. This

    example uses some simple math and basic assumptions about the scaling of multiple

    processors and is included for demonstration purposes only. Until multicore processors are

    available, scaling and performance can only be estimated based on technical models. The

    example described in this article shows one possible method of addressing relative

    performance levels as the industry begins to move from platforms based on single-core

    processors to platforms based on multicore processors. Other methods are possible, and

    actual processor performance and processor scalability are tied to a variety of platform

    variables, including the specific configuration and application environment. Several

    factors can potentially affect the internal scalability of multiple cores, such as the system

    compiler as well as architectural considerations including memory, I/O, front side bus

    (FSB), chip set, and so on.

    For instance, enterprises can buy a dual-processor server today to provide e-mail,

    calendaring, and messaging functions. Dual-processor servers are designed to deliver

    excellent price/performance for messaging applications. In our example we use dual 3.6

    GHz processors supporting simultaneous multithreading Technology.

    The following simple example can help explain the relative performance of a low-

    clock-speed, dual-core processor versus a high-clock-speed, dual-processor counterpart.

    Dual-processor systems available today offer a scalability of roughly 80 percent for the

    second processor, depending on the OS, application, compiler, and other factors. That

    means the first processor may deliver 100 percent of its processing power, but the second

    processor typically suffers some overhead from multiprocessing activities. As a result, the

    two processors do not scale linearlythat is, a dual-processor system does not achieve a

    200 percent performance increase over a single-processor system, but instead provides

    approximately 180 percent of the performance that a single-processor system provides. In

    this article, the single-core scalability factor is referred to as external, or socket-to-socket,

    scalability. When comparing two single-core processors in two individual sockets, the

    dual 3.6 GHz processors would result in an effective performance level of approximately

    6.48 GHz (see Figure).

  • 8/7/2019 Multicore Processor Report

    12/19

    [12]

    Fig 5.1(Sample core speed and anticipated total relative power in a system using two

    single-core processors)

    For multicore processors, administrators must take into account not only socket-to-socket

    scalability but also internal, orcore-to-core, scalabilitythe scalability between multiple

    cores that reside within the same processor module. In this example, core-to-core

    scalability is estimated at 70 percent, meaning that the second core delivers 70 percent of

    its processing power.

    Thus, in the example system using 2.8 GHz dual-core processors, each dual-core

    processor would behave more like a 4.76 GHz processor when the performance of the two

    cores2.8 GHz plus 1.96 GHzis combined. For demonstration purposes, this example

    assumes that, in a server that combines two such dual-core processors within the same

    system architecture, the socket-to-socket scalability of the two dualcore processors would

    be similar to that in a server containing two single-core processors80 percent

    scalability. This would lead to an effective performance level of 8.57 GHz (see Figure).

  • 8/7/2019 Multicore Processor Report

    13/19

    [13]

    Fig 5.2 (Sample core speed and anticipated total relative power in a system using two

    dual-core processors)

  • 8/7/2019 Multicore Processor Report

    14/19

    [14]

    6.MAPPING OF AN APPLICATION TO

    MULTICORE PROCESSOR

    Task parallelism is the concurrent execution of independent tasks in software. On a single-core processor, separate tasks must share the same processor. On a multicore processor,

    tasks essentially run independently of one another, resulting in more efficient execution.

    For mapping first comes Identifying a Parallel Task Implementation. Identifying

    the task parallelism in an application is a challenge that, for now, must be tackled

    manually. After identifying parallel tasks, mapping and scheduling the tasks across a

    multicore system requires careful planning. A four-step process, derived from Software

    Decomposition for Multicore Architectures [1], is proposed to guide the design of the

    application:

    1. Partitioning Partitioning of a design is intended to expose opportunities for parallel

    execution. The focus is on defining a large number of small tasks in order to yield a fine-

    grained decomposition of a problem.

    2. Communication The tasks generated by a partition are intended to execute

    concurrently but cannot, in general, execute independently. The computation to be

    performed in one task will typically require data associated with another task. Data must

    then be transferred between tasks to allow computation to proceed. This information flow

    is specified in the communication phase of a design.

    3. Combining Decisions made in the partitioning and communication phases are

    reviewed to identify a grouping that will execute efficiently on the multicore architecture.

    4. Mapping This stage consists of determining where each task is to execute.

  • 8/7/2019 Multicore Processor Report

    15/19

    [15]

    7. APPLICATIONS

    Database servers Web servers (Web commerce) Compilers Video editing Encoding. 3D gaming. Powerful graphics solution The full effect and the advantage of having a multi-core processor, when it is used

    together with a multithreading operating.

    Multimedia applications Scientific applications, CAD/CAM In general, applications with Thread-level parallelism

    Fig 7.1 (two different processes running concurrently in two different cores)

  • 8/7/2019 Multicore Processor Report

    16/19

    [16]

    8. ADVANTAGES AND DISADVANTAGES

    8.1 ADVANTAGES

    (i).Cache coherency: The proximity of multiple CPU cores on the same die allowsthe cache coherency circuitry to operate at a much higher clock-rate than is possible if the

    signals have to travel off-chip. Combining equivalent CPUs on a single die significantly

    improves the performance of cache snoop (alternative: Bus snooping) operations. Put

    simply, this means that signals between different CPUs travel shorter distances, and

    therefore those signals degrade less. These higher-quality signals allow more data to be

    sent in a given time period, since individual signals can be shorter and do not need to be

    repeated as often. (ii)less power requirement: The largest boost in performance will likely

    be noticed in improved response-time while running CPU-intensive processes, like

    antivirus scans, ripping/burning media (requiring file conversion), or file searching. Forexample, if the automatic virus-scan runs while a movie is being watched, the application

    running the movie is far less likely to be starved of processor power, as the antivirus

    program will be assigned to a different processor core than the one running the movie

    playback.(iii).Reduced size of printed circuit board: Assuming that the die can fit into the

    package, physically, the multi-core CPU designs require much less printed circuit board

    (PCB) space than do multi-chip SMP designs. Also, a dual-core processor uses slightly

    less power than two coupled single-core processors, principally because of the decreased

    power required to drive signals external to the chip. Furthermore, the cores share some

    circuitry, like the L2 cache and the interface to the front side bus (FSB). In terms ofcompeting technologies for the available silicon die area, multi-core design can make use

    of proven CPU core library designs and produce a product with lower risk of design error

    than devising a new wider core-design. Also, adding more cache suffers from diminishing

    returns. (iv) Multi-tasking productivity: multi-core processor PC users will experience

    exceptional performance while executing multiple tasks simultaneously. The ability to do

    complex, multi-tasked workloads, such as creating professional digital content while

    checking and writing e-mails in the foreground, and also running firewall software or

    downloading audio files off the Web in the background, will allow consumers and

    workers to do more work in less time (v). PC security can be enhanced because multicore

    processors can run more sophisticated virus, spam, and hacker protection in the

    background without performance penalties (vi) Cool and quiet : The enhanced

    performance offered by multicore processors will come without the additional heat and

    fan noise that would likely accompany performance increases with single-core processor

    machines.

  • 8/7/2019 Multicore Processor Report

    17/19

    [17]

    8.2. DISADVANTAGES

    Maximizing the utilization of the computing resources provided by multi-core

    processors requires adjustments both to the operating system (OS) support and to existing

    application software.

    Also, the ability of multi-core processors to increase application performance

    depends on the use of multiple threads within applications.

    Integration of a multi-core chip drives chip production yields down and they are

    more difficult to manage thermally than lower-density single-chip designs.

    From an architectural point of view, ultimately, single CPU designs may make

    better use of the silicon surface area than multiprocessing cores, so a development

    commitment to this architecture may carry the risk of obsolescence.

    Using amulticore processor to its full potential is another issue. If programmers

    dont write applicationsthat take advantage of multiple cores there is no gain, and in some

    cases there is a loss ofperformance. Application need to be written so that different parts

    can be run concurrently .They donot work as n times of single core processor due to

    shocket to shocket scalability and core to core scalability.

  • 8/7/2019 Multicore Processor Report

    18/19

    [18]

    9. CONCLUSION

    In the next years the trend will go to multi-core processors more and more. The

    main reason is that they are faster than single-core processors and they can be still

    improved., but added interesting new problems. But in the future there will be still someapplications for single-core processors because not every system needs a fast processor.

    Several new multi-core chips in design phases. Parallel programming techniques

    likely to gain importance.

  • 8/7/2019 Multicore Processor Report

    19/19

    [19]

    REFERENCES

    [1] L Hammond, BA Nayfeh, K Olukotun, A Single-Chip Multiprocessor, IEEE, sept

    1997

    [2] P. Frost Gorder, Multicore Processors for Science and Engineering , IEEE CS,

    March/April 2007

    [3] D. Geer, Chip Makers Turn to Multicore Processors, Computer, IEEE Computer

    Society, May 2005

    [4] R. Merritt, CPU Designers Debate Multi-core Future, EETimes Online, February

    2008, http://www.eetimes.com/showArticle.jhtml?articleID=206105179

    [5] R. Merritt, X86 Cuts to the Cores, EETimes Online, September 2007,

    http://www.eetimes.com/showArticle.jtml?articleID=202100022


Recommended