+ All Categories
Home > Documents > The Micro Architecture of Intel Pentium 4

The Micro Architecture of Intel Pentium 4

Date post: 07-Apr-2018
Category:
Upload: rekha-govindaraj
View: 219 times
Download: 0 times
Share this document with a friend

of 20

Transcript
  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    1/20

    1

    The Microarchitecture of Intel

    Pentium 4

    Sudipta Mahapatra

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    2/20

    2

    Introduction

    The Intel Pentium 4 was introduced in November2000 targeted at a high clock rate of 1.5 GHz.

    The Netburst microarchitecture formed the basis

    for a new family of Intel processors starting from

    the Pentium 4.

    Developed with an intention of delivering high

    level of performance for many important

    applications such as multimedia.

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    3/20

    3

    Targeted application areas

    Internet audio and streaming video.

    Image processing

    Video content creation

    Speech recognition

    3D applications and games.

    Video editing and video conferencing.

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    4/20

    4

    Overview of the

    Netburst Microarchitecture

    Uses a deeply pipelined architecture to ensure a

    high clock rate.

    Uses a high-performance, quad-pumped bus

    interface to the 100 MHz system bus to transferdata at a rate of 400 MHz.

    Uses a high speed execution engine to reduce the

    latency of basic integer instructions

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    5/20

    5

    Overview (Contd.)

    Out-of-order speculative execution to enable

    parallelism

    Superscalar issue to exploit maximal parallelism

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    6/20

    6

    Main Features

    Hardware register renaming to avoid registername space limitations (WAW hazards)

    Cache line sizes of 64 bytes

    Optimization for the common case of frequently

    executed instructions

    Improved branch handling techniques.

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    7/20

    7

    Basic Block Diagram

    Branch-history update

    [Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    8/20

    8

    Main sections

    1. In order front end (FE)2. Out-of-order Execution logic (OOE)

    3. Integer and Floating-point Execution Units

    (EX)4. Memory Subsystem (M)

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    9/20

    9

    In order front end

    Fetches the instructions to be executed next.Supplies a set ofdecoded instructions to the

    execution pipeline.

    Uses accurate branch prediction logic to

    determine the branch target.

    The instructions from the branch target are

    decoded to generate a set of micro-operations

    or uops that may be executed in the executioncore.

    Uses the trace cache to store the uops

    corresponding to the most recently executed

    instructions.

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    10/20

    10

    Front end

    From L2

    Cache

    To Allocator/

    RegisterRenamer

    [Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    11/20

    11

    Front end components Trace cache (TC): Serves as the L1 instruction

    cache. However, it holds the uops corresponding to

    the most recently decoded instructions.

    Delivers up to three uops per clock cycle to the

    OOE. Capacity=12K uops.

    Only in case of TC miss, the L2 cache is accessed.

    The trace cache has its own branch predictor that

    indicates where to go next in the trace cache.

    This is smaller than the Front-end BTB as it isconcerned only with the subset of instructions that

    are currently in the trace cache.

    Also includes a 16-entry return address stack.

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    12/20

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    13/20

    13

    Front end components (Contd.)

    Instruction decoder: Receives two IA-32 instructions at atime from the L2 cache and decodes them into uops.

    Can decode at a maximum rate of one IA-32 instruction at

    a time.

    Most of the instructions are converted into single uops. If the instruction needs more than 4 uops, control is

    transferred into the microcode ROM.

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    14/20

    14

    Out-of-order Execution logic

    Prepares the instructions for out-of-orderexecution.

    Uses aggressive reordering to execute the

    instructions as soon as they are ready to execute.

    Maximal utilization of execution resources.

    Has retirement logic to reorder the instructions so

    that they commit in order.

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    15/20

    15

    Out-of-order Execution logic

    From uop Queue

    To execution units

    [Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    16/20

    16

    Execution Units

    The execution units include several integer and

    floating point units for result computation.

    The execution section also includes the L-1 data

    cache used for most of the load/store operations.

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    17/20

    17

    Execution Units

    From/to

    memory

    subsystem

    From out-of-order execution logic

    [Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    18/20

    18

    Memory Subsystem

    The memory section contains the L2 cache and the

    system bus.

    Used to access the main memory when the L2

    cache has a cache miss.Also used to access the I/O resources.

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    19/20

    19

    Memory Subsystem

    To ITLB/Prefetcher

    From execution units

    [Glenn Hinton et. al., Intel Technology Jn. Q1, 2001]

  • 8/6/2019 The Micro Architecture of Intel Pentium 4

    20/20

    20

    Pentium 4 pipeline

    The P6 microarchitecture (P2, P3, Celeron) has

    twice the pipeline depth of Pentium processor.

    The Netburst microarchitecture has almost

    doubled the depth of pipelining of P6.- It allows for a higher frequency of operation.

    - Different parts of Pentium 4 operate at different

    clock frequencies.


Recommended