+ All Categories
Home > Documents > Chapter 5 Overview

Chapter 5 Overview

Date post: 24-Jan-2016
Category:
Upload: devaki
View: 22 times
Download: 0 times
Share this document with a friend
Description:
Chapter 5 Overview. The principles of pipelining A pipelined design of SRC Pipeline hazards Instruction-level parallelism (ILP) Superscalar processors Very Long Instruction Word (VLIW) machines Microprogramming Control store and micro-branching Horizontal and vertical microprogramming. - PowerPoint PPT Presentation
87
S 2/e C D A Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall Chapter 5 Overview The principles of pipelining A pipelined design of SRC Pipeline hazards Instruction-level parallelism (ILP) Superscalar processors Very Long Instruction Word (VLIW) machines Microprogramming Control store and micro-branching Horizontal and vertical microprogramming
Transcript
Page 1: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Chapter 5 Overview

The principles of pipelining A pipelined design of SRC Pipeline hazards Instruction-level parallelism (ILP)

Superscalar processors Very Long Instruction Word (VLIW) machines

Microprogramming Control store and micro-branching Horizontal and vertical microprogramming

Page 2: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Bölüm 5 Genel Bakış

Pipeline mimarisinin esasları SRC nin pipeline tasarımı Pipeline riskleri Instruction-level parallelism (ILP)

Superscalar işlemciler Very Long Instruction Word (VLIW) makineleri

Microprogramming Control store ve micro-branching Horizontal(Yatay) ve vertical(Dikey) microprogramming

Page 3: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.1 Executing Machine Instructions vs. Manufacturing Small Parts

Page 4: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

The Pipeline Stages

5 pipeline stages are shown 1. Fetch instruction 2. Fetch operands 3. ALU operation 4. Memory access 5. Register write

5 instructions are executing shr r3, r3, 2 ;storing result in r3 sub r2, r5, r1 ;idle, no mem. access needed add r4, r3, r2 ;adding in ALU st r4, addr1 ;accessing r4 and addr1 ld r2, addr2 ;instruction being fetched

Page 5: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Pipeline Aşamaları

5 pipeline aşaması 1. Fetch instruction 2. Fetch operands 3. ALU işlemleri 4. Bellek erişimi 5. Register yazma

5 komut işleniyor shr r3, r3, 2; sonuç r3 e depolanır sub r2, r5, r1 ; idle, bellek ulaşımına gerek

yok add r4, r3, r2 ; ALU da toplama st r4, addr1; r4 ve addr1 e ulaşılması ld r2, addr2; komutun getirilmesi

Page 6: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Notes on Pipelining Instruction Processing

Pipeline stages are shown top to bottom in order traversed by one instruction

Instructions listed in order they are fetched Order of insts. in pipeline is reverse of listed If each stage takes one clock:

- every instruction takes 5 clocks to complete

- some instruction completes every clock tick Two performance issues: instruction latency, and instruction

bandwidth

Page 7: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Pipeline Komut İşleme

Pipeline stages are shown top to bottom in order traversed by one instruction

Komutlar fetch edildiği sırada listelenir. Pipeline da komutlarun sırası listenin tersinedir. Eğer her aşama bir clock tutarsa:

- her komut 5 clock da tamamlanır

- her clock da komut bitimi İki performans konusu: komut gecikme süresi, ve komut bant

genşiliği

Page 8: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Dependence Among Instructions

Execution of some instructions can depend on the completion of others in the pipeline

One solution is to “stall” the pipeline early stages stop while later ones complete processing

Dependences involving registers can be detected and data “forwarded” to instruction needing it, without waiting for register write

Dependence involving memory is harder and is sometimes addressed by restricting the way the instruction set is used

“Branch delay slot” is example of such a restriction “Load delay” is another example

Page 9: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Komutlar Arasındaki Bağımlılık

Pipeline da bazı komutların işlenmesi, diğerlerinin bitimine bağlıdır.

Bir çözüm “stall” (bekletme) dir İlk aşamalar, sonrakiler işlemlerini bitirirken, beklerler.

Register ları içeren bağlılıklar, register yazması beklenmeden, tespit edilebiilir ve veri kendine ihitiyaç olunan komuta forward edilir.

Bellek içeren bağlılıklar daha zordur ve kullanılacak komut kümesi yolunda kıstlamalar oluşturabilir.

“Branch delay slot” bu tip bir kıstlamaya örnek olabilie. “Load delay” bir diğer örnektir

Page 10: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Branch and Load Delay Examples

Branch Delay

Load Delay

brz r2, r3add r6, r7, r8st r6, addr1

This inst. always executed

Only done if r3 0

ld r2, addradd r5, r1, r2shr r1,r1,4sub r6, r8, r2

This inst. gets “old”value of r2

This inst. gets r2 valueloaded from addr

Working of instructions not changed, but way they work together is

Page 11: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Branch ve Load Gecikme Örnekleri

Branch Gecikmesi

Load Gecikmesi

brz r2, r3add r6, r7, r8st r6, addr1

Bu komut herzaman işlenir

Sadece r3, 0 olmazsa (r3 0)

ld r2, addradd r5, r1, r2shr r1,r1,4sub r6, r8, r2

Bu komut r2 nin eski değerini alır

Bu komut addr den r2 ye yüklenen değeri alır

Komutların çalışması değişmez, fakat birlikte çalışma yolu değişebilir.

Page 12: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Characteristics of Pipelined Processor Design

Main memory must operate in one cycle This can be accomplished by expensive memory, but It is usually done with cache, to be discussed in Chap. 7

Instruction and data memory must appear separate Harvard architecture has separate instruction & data memories Again, this is usually done with separate caches

Few buses are used Most connections are point to point Some few-way multiplexers are used

Data is latched (stored in temporary registers) at each pipeline stage—called “pipeline registers.”

ALU operations take only 1 clock (esp. shift)

Page 13: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Pipeline İşlemci Tasarımının Özellikleri

Ana bellek tek cycle da işlenmeli Pahalı bellek kullanılması gerekir, fakat Bu işlem cache ile genelde yapılır , to be discussed in Chap. 7

Komut ve veri belleği ayrı görülmelidir Harvard mimarisi ayrı komut ve veri belleklerine sahiptir. Bu genelde ayrı cache lerde yapılır.

Az miktar da veri yolu kullanılır Pek çok bağlantı point to point dir. Bazı few-way multiplexers(çoklayıcı) kullanılır.

Veri, her pipeline aşamasında tutulur (geçici registera depolanır) —called “pipeline registers.”

ALU işlemleri sadece 1 clock alır. (esp. shift)

Page 14: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Adapting Instructions to Pipelined Execution

All instructions must fit into a common pipeline stage structure We use a 5 stage pipeline for the SRC

1) Instruction fetch

2) Decode and operand access

3) ALU operations

4) Data memory access

5) Register write We must fit load/store, ALU, and branch instructions into this

pattern

Page 15: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Komutların Pipeline Olarak İşlenmeye Adapte Edilmesi

Bütün komutlar genel bir pipeline aşama yapısına uymak zorundadır.

Biz SRC için 5 aşamalı pipeline mimarisi kullancağız

1) Instruction fetch

2) Decode ve operand ulaşımı

3) ALU işlemleri

4) Veri Belleğine Ulaşım

5) Register yazma Biz load/store, ALU ve branch komutlarını bu yapıya uyğun hale

getireceğiz.

Page 16: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.2 ALU Instructions fit into 5 Stages

• Second ALU operand comes either from a register or instruction register c2 field

• Op code must be available in stage 3 to tell ALU what to do

• Result register, ra, is written in stage 5

• No memory operation

Page 17: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.2 ALU Komutu 5 aşamada

• İkinci ALU işlemi register dan veya c2 den gelebilir.

• Op code 3. aşamada ALU ya ne yapılacağını söylemesi için hazır olmalıdır.

• Somuç registeri ra ya 5. aşamada yazılır.

• Bellek işlemi yoktur.

Page 18: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.4 Load and Store Instructions

ALU computes effective addresses

Stage 4 does read or write

Result reg. written only on load

Page 19: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.4 Load ve Store Komutları

ALU computes effective addresses

Aşama 4 read veya write yapar

Sonuç reg. Sadece load da yazılır.

Page 20: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.6 SRC Pipeline Registers and RTN

Specification

The pipeline registers pass info. from stage to stage

RTN specifies output reg. values in terms of input reg. values for stage

Discuss RTN at each stage on blackboard

Page 21: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.6 SRC Pipeline Registers and RTN

Specification

pipeline registerlar aşamadan aşamaya bilgi geçirirler.

RTN aşamlar için output reg. değerlerini, input reg değerleri açısından belirtir.

Discuss RTN at each stage on blackboard

Page 22: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Global State of the Pipelined SRC

PC, the general registers, instruction memory, and data memory is the global machine state

PC is accessed in stage 1 (& stage 2 on branch) Instruction memory is accessed in stage 1 General registers are read in stage 2 and written in stage 5 Data memory is only accessed in stage 4

Page 23: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Pipeline SRC de Global State

PC, the general registers, komut belleği, and veri belleği global makine durumudur.

PC ye aşama 1 de ulaşılır. (& aşama 2 on branch) Komut belleğine ye aşama 1 de ulaşılır. Genel registers aşama 2 de okunur ve aşama 5 de yazılır. Veri belleğine sadece aşama 4 de ulaşılır.

Page 24: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Restrictions on Access to Global State by Pipeline

We see why separate instruction and data memories (or caches) are needed

When a load or store accesses data memory in stage 4, stage 1 is accessing an instruction

Thus two memory accesses occur simultaneously Two operands may be needed from registers in stage 2

while another instruction is writing a result register in stage 5

Thus as far as the registers are concerned, 2 reads and a write happen simultaneously

Increment of PC in stage 1 must be overridden by a successful branch in stage 2

Page 25: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Pipeline: Global State e ulaşımda ki Kısıtlamalar

Neden ayrı komut ve veri bellekleri kullandığımızı gördük Bir load veya store komutu stage 4 de veri belleğine

ulaşırken, aşama 1 de bir komut a ulaşılır. Böylece iki bellek ulaşımı eş zamanlı meydana gelir.

Aşama 2 de register ların 2 tane operand ihtiyacı varken, bir diğer komut aşama 5 de sonuç register ına veri yazar.

Böylece 2 read ve 1 write işlemi eş zaamnlı olarak gerçekleşir. Aşama 2 deki başarılı bir branch işlemi, aşama 1 de PC nin

artırılmasını zorunlu kılar.

Page 26: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.7 Pipeline Data Path &

Control Signals

Most control signals shown and given values

Multiplexer control is stressed in this figure

Page 27: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.7 Pipeline Data Path &

Control Signals

Pek çok kontrol sinyali ve değerleri

Çoklayıcı kontrolü bu figürde ön plana çıkartılmıştır.

Page 28: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Example of Propagation of Instructions Through Pipe

It is assumed that R[11] contains 512 when the brl instruction is executed

R[6] = 4 and R[8] = 5 are the add operands R[5] =16 for the ld and R[12] = 23 for the str

100: add r4, r6, r8; R[4] R[6] + R[8];104: ld r7, 128(r5); R[7] M[R[5]+128];108: brl r9, r11, 001; PC R[11]: R[9] PC;112: str r12, 32; M[PC+32] R[12]; . . . . . .512: sub ... next instruction

Page 29: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Pipe İşleminde komutların yayılmasına örnekler

brl komutu işlendiği zaman, R[11] in 512 içermesi beklenir R[6] = 4 ve R[8] = 5 add operandlarıdır. R[5] =16 ld için ve R[12] = 23 str için

100: add r4, r6, r8; R[4] R[6] + R[8];104: ld r7, 128(r5); R[7] M[R[5]+128];108: brl r9, r11, 001; PC R[11]: R[9] PC;112: str r12, 32; M[PC+32] R[12]; . . . . . .512: sub ... Sonraki komut

Page 30: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.8 Cycle 1 add Enters Pipe

Program counter is incremented to 104

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Page 31: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.8 Cycle 1 add Enters Pipe

PC 104 e artıtılır.

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Page 32: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.9 Cycle 2ld Enters Pipe

add operands are fetched in stage 2

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Page 33: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.9 Cycle 2ld Enters Pipe

Aşama 2 de add operandları getirildi.

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Page 34: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.10 Cycle 3

brl Enters Pipe

add performs its arithmetic in stage 3

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Page 35: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.10 Cycle 3

brl Enters Pipe

Aşama 3 de add aritmetik işlemini yerine getirir

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Page 36: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.11 Cycle 4str enters pipe

add is idle in stage 4 Success of brl changes

program counter to 512

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Page 37: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.11 Cycle 4str enters pipe

add aşama 4 deki gibi aynıdır

Brl PC yi 512 ye değiştirir.

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Page 38: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

add completes in stage 5

sub is fetched from loc. 512 after successful brl

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Fig 5.12 Cycle 5 sub Enters Pipe

Page 39: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

add aşama 5 de tamamlanır

Brl den sonra, sub 512 location ından alıp getirilir.

512: sub ... . . . . . .112: str r12, #32108: brl r9, r11, 001104: ld r7, r5, #128100: add r4, r6, r8

Fig 5.12 Cycle 5 sub Enters Pipe

Page 40: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Functions of the SRC Pipeline Stages

Stage 1: fetches instruction PC incremented or replaced by successful branch in stage 2

Stage 2: decodes inst. and gets operands Load or store gets operands for address computation Store gets register value to be stored as 3rd operand ALU operation gets 2 registers or register and constant

Stage 3: performs ALU operation Calculates effective address or does arithmetic/logic May pass through link PC or value to be stored in mem.

Page 41: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

SRC Pipeline Aşamalrının Fonksiyonları

Aşama 1: komutun alıp getirilmesi (fetch) PC arttırılır veya aşama 2 de başarılı bir branch (dallanma) ile

yenilenir. Aşama 2: komutun decode edilmesi ve operandların alınması

Load veya store, adres hesaplaması için operandalrı alır Store 3. operand olarak depolancak register değerini alır ALU işlemi 2 register veya 1 register ve 1 sabit alır.

Aşama 3: ALU işleminin gerçekleştirilmesi Effective adres hesaplanır veya arithmetic/logic işlemler yapılır PC veya bellekde depolanmış degere geçiş olabilir.

Page 42: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Functions of the SRC Pipeline Stages (continued)

Stage 4: accesses data memory Passes Z4 to Z5 unchanged for non-memory instructions Load fills Z5 from memory Store uses address from Z4 and data from MD4(no longer needed)

Stage 5: writes result register Z5 contains value to be written, which can be ALU result, effective

address, PC link value, or fetched data ra field always specifies result register in SRC

Page 43: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

SRC Pipeline Aşamalrının Fonksiyonları

Aşama 4: veri belleğine ulaşılması Bellek kullanılmayan komutlarda Z4 ve Z5 değişmeden geçilir. Store, Z4 den adres ve MD4 den veriyi kullanır

Aşama 5: sonuç registerin yazılması Z5 yazılacak değeri tutar, bu değer ALU result, effective address,

PC link value, veya fetched data olabilir. SRC de ra alanı genelde sonuç register olarak belirtilir.

Page 44: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Dependence Between Instructions in Pipe: Hazards

Instructions that occupy the pipeline together are being executed in parallel

This leads to the problem of instruction dependence, well known in parallel processing

The basic problem is that an instruction depends on the result of a previously issued instruction that is not yet complete

Two categories of hazards

Data hazards: incorrect use of old and new data

Branch hazards: fetch of wrong instruction on a change in PC

Page 45: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Komutlar Arasındaki Bağlılıklar in Pipe: Hazards

Pipeline da komutlar paralel olarak işletilir. Paralel işlemede, bu durum komutların birbirine bağlılık

problemine sebep olur. Temel problem, bir komutun çalışmasının, çalışmasını

bitirmemiş başka bir komutun işlenmesi sonucu ortaya çıkacak olan sonuca bağlı olmasından ileri gelir.

Hataları iki kategoride inceleriz

Veri hazards: eski ve yeni verinin yanlış kullanımı

Dallanma (Branch) hazards: PC de ki değişim sonucunda yanlış komutun fetch edilmesi

Page 46: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

General Classification of Data Hazards(Not Specific to SRC)

A read after write hazard (RAW) arises from a flow dependence, where an instruction uses data produced by a previous one

A write after read hazard (WAR) comes from an anti-dependence, where an instruction writes a new value over one that is still needed by a previous instruction

A write after write hazard (WAW) comes from an output dependence, where two parallel instructions write the same register and must do it in the order in which they were issued

Page 47: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Veri Hazard larının Genel Sınıflandırılması

A read after write hazard (RAW) arises from a flow dependence,

bir komutun bir önceki komut tarafından oluşturulan veriyi kullanması gereken durumda oluşur.

A write after read hazard (WAR) comes from an anti-dependence,

bir komutun yeni bir değeri bir yere yazarken, oradan hala bir önceki komutun değer alması gerekiyorsa oluşur.

A write after write hazard (WAW) comes from an output dependence, iki paralel komutun aynı registera yazma durumları varsa, bu işlemleri işleme sırasına göre yapmalrı gerekir.

Page 48: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Detecting Hazards and Dependence Distance

To detect hazards, pairs of instructions must be considered Data is normally available after being written to reg. Can be made available for forwarding as early as the stage

where it is produced Stage 3 output for ALU results, stage 4 for mem. fetch

Operands normally needed in stage 2 Can be received from forwarding as late as the stage in which

they are used Stage 3 for ALU operands and address modifiers, stage 4 for

stored register, stage 2 for branch target

Page 49: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Hataların Tespit Edilmesi ve Bağımlılık Mesafesi

Hataların ayıklanması için komut çifti bilinmelidir Veri normalde reg. e yazıldıktan sonra uygun olur. “Forwarding” işlemi aşamalarda gerektiği anda en erken

biçimde yapılmalıdır aşama 3 ALU sonucu için output, aşama 4 bellek fetch için

Operandlar normalde aşama 2 için ihtiyaç olurlar Can be received from forwarding as late as the stage in which

they are used Aşama 3 ALU operandları ve adres modifierları için, aşama 4

depolama register için, aşama 2 branch target için

Page 50: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Data Hazards in SRC

Since all data memory access occurs in stage 4, memory writes and reads are sequential and give rise to no hazards

Since all registers are written in the last stage, WAW and WAR hazards do not occur

Two writes always occur in the order issued, and a write always follows a previously issued read

SRC hazards on register data are limited to RAW hazards coming from flow dependence

Values are written into registers at the end of stage 5 but may be needed by a following instruction at the beginning of stage 2

Page 51: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

SRC de Veri Hataları

Bütün veri bellek ulaşımı aşama 4 de olduğu için, bellek yazma ve okuma ardışık olur ve hata oluşma riski azalır.

Bütün register lara son aşamada yazma işlemi olduğundan; WAW ve WAR hataları oluşmaz

İki yazma genelde işleme sırasında göre olur, ve bir write genelde bir önceki işlenen read işlemini takip eder.

Register verisi üzerindeki SRC hataları RAW hatası ile sınırlıdır.

Aşama 5 in sonunda register lara yazılan değerler bir sonraki komutda aşama 2 nin başında ihtiyaç duyulabilirler.

Page 52: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Possible Solutions to the Register Data Hazard Problem

Detection: The machine manual could list rules specifying that a dependent

instruction cannot be issued less than a given number of steps after the one on which it depends

This is usually too restrictive Since the operation and operands are known at each stage,

dependence on a following stage can be detected Correction:

The dependent instruction can be “stalled” and those ahead of it in the pipeline allowed to complete

Result can be “forwarded” to a following inst. in a previous stage without waiting to be written into its register

Preferred SRC design will use detection, forwarding and stalling only when unavoidable

Page 53: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Register Data Hazard Problemleri için Muhtemel Çözümler

Tespit Edilmesi: The machine manual could list rules specifying that a dependent

instruction cannot be issued less than a given number of steps after the one on which it depends

This is usually too restrictive İşlemler ve operandlar her aşamada bilindiği için, bir sonraki

aşamada ki bağımlıklık tespit edilebilir. Doğrulanması:

Bağımlı komut bekletilmelidir (stall) ve those ahead of it in the pipeline allowed to complete

Sonuç, bir önceki aşamda registerlara yazma işlemi beklenmeden bir sonraki komuta iletilmelidir.(forwarding)

Tercih edilen SRC tasarımı detection, forwarding ve stalling kullacaktır.

Page 54: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

RAW, WAW, and WAR Hazards

RAW hazards are due to causality: one cannot use a value before it has been produced.

WAW and WAR hazards can only occur when instructions are executed in parallel or out of order.

Not possible in SRC. Are only due to the fact that registers have the same name. Can be fixed by renaming one of the registers or by delaying the updating of

a register until the appropriate value has been produced.

Page 55: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

RAW, WAW, ve WAR Hazards

RAW hazards nedenseldir: biri bir değeri kullanmamalıdır, o değer oluşturulmadan önce

WAW ve WAR hazards sadece komutlar paralel ve ya sıra dışında çalıştırıldıklarında oluşur.

SRC de mümkün değildir. Sadece register ların aynı isimlerde olması durumuda Bir register ın yeniden adlandırılmasıyla veya register ın güncellenmesini

uyğun değerin üretilmesine kadar erteleterek düzeltilir.

Page 56: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Tbl 5.1 Instruction Pair Hazard Interaction

Class alu load ladr brl N/E 6/4 6/5 6/4 6/2Class N/L

alu 2/3load 2/3ladr 2/3store 2/3branch 2/2

4/1 4/2 4/1 4/14/1 4/2 4/1 4/14/1 4/2 4/1 4/14/1 4/2 4/1 4/14/2 4/3 4/2 4/1

Result Normally/Earliest available

ValueNormally/Latestneeded

Instruction separation to eliminatehazard, Normal/Forwarded

Latest needed stage 3 for store is based on address modifier register. The stored value is not needed until stage 4 Store also needs an operand from ra. See Text Tbl 5. Instruction separation is used rather than bubbles because of the applicability to multi-issue, multi-pipelined

machines.

Read from Reg. File

Write to Reg. File

Page 57: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Tbl 5.1 Instruction Pair Hazard Interaction

Class alu load ladr brl N/E 6/4 6/5 6/4 6/2Class N/L

alu 2/3load 2/3ladr 2/3store 2/3branch 2/2

4/1 4/2 4/1 4/14/1 4/2 4/1 4/14/1 4/2 4/1 4/14/1 4/2 4/1 4/14/2 4/3 4/2 4/1

Result Normally/Earliest available

ValueNormally/Latestneeded

Instruction separation to eliminatehazard, Normal/Forwarded

Store için en son ihtiyaç duyulan aşama 3 adres modifier register a bağlıdır. Depolanan değer aşama 4 e kadar ihtiyaç olunmaz.

Store ra dan bir operand a ihtiyaç duyar. See Text Tbl 5. Komut ayrıştırma baloncukların yerine kullanılır, multi-issue, multi-pipeline makinelerinin uygulanabilirlikleri

sebebiyle.

Read from Reg. File

Write to Reg. File

Page 58: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Delays Unavoidable by Forwarding

In the column headed by load, we see the value loaded cannot be available to the next instruction, even with forwarding

Can restrict compiler not to put a dependent instruction in the next position after a load (next 2 positions if the dependent instruction is a branch)

Target register cannot be forwarded to branch from the immediately preceding instruction

Code is restricted so that branch target must not be changed by instruction preceding branch (previous 2 instructions if loaded from mem.)

Do not confuse this with the branch delay slot, which is a dependence of instruction fetch on branch, not a dependence of branch on something else

Page 59: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Forwarding ile önlenemeyen Gecikmeler

Load sütununda, forwarding olmasına ragmen, yüklenen değerin bir sonraki komut için hazır olmayacağını görüyoruz.

restrict compiler load dan sonra bağımlı bir komut koymazlar (eğer bağımlı komut branch ise sonraki 2 pozisyon için)

Hedef reg. , önceki komutdan branch e forward edilmeyebilir. Kode kısıtlanmıştır, böylece dallanma hedefi, önceki branch e göre

değişmemelidir. (eğer bellekden yüklendiyse önceki 2 komut) Bunu, dallanma gecikmesiyle karıştımaynız. which is a dependence

of instruction fetch on branch, not a dependence of branch on something else

Page 60: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Stalling the Pipeline on Hazard Detection

Assuming hazard detection, the pipeline can be stalled by inhibiting earlier stage operation and allowing later stages to proceed

A simple way to inhibit a stage is a pause signal that turns off the clock to that stage so none of its output registers are changed

If stages 1 & 2, say, are paused, then something must be delivered to stage 3 so the rest of the pipeline can be cleared

Insertion of nop into the pipeline is an obvious choice

Page 61: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Hata Tespitinde Pipeline ı Bekletme

Hata tespitini düşünün, pipeline ilk aşamaların işlemini azlatmak ve sonraki aşamları ilerletmek için bekletilsin.

Bir aşamayı engellemenin basit yolu durma sinyali dir. Bu sinyal a aşama için clock u durdurur ve o aşamanın output reg. lerinin değişmesini önler.

Eğer aşama 1 ve 2 durdurulursa, pipeline ın geri kalan kısmı temizlenir.

Pipeline a nop göndermek belirli bir tercih olabilir.

Page 62: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.14 Stall Due to a Dependence Between Two alu Instructions

Page 63: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Restrictions Left If Forwarding Done Wherever Possible

1) Branch delay slot

• The instruction after a branch is always executed, whether the branch succeeds or not.

2) Load delay slot

• A register loaded from memory cannot be used as an operand in the next instruction.

• A register loaded from memory cannot be used as a branch target for the next two instructions.

3) Branch target

• Result register of alu or ladr instruction cannot be used as branch target by the next instruction.

br r4add . . . • • •

ld r4, 4(r5)nopneg r6, r4

ld r0, 1000nopnopbr r0

not r0, r1nopbr r0

Page 64: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Restrictions Left If Forwarding Done Wherever Possible

1) Branch delay slot

• Branch den sonraki komut herzaman işlenir, branch başarılı olsun ya da olmasın.

2) Load delay slot

• Bellekden yüklenen register bir sonraki register in operand ı olarak kullanılmayabilir.

• Bellekden yüklenen bir reg. bir sonraki komut için dallanma hedefi olmaz.

3) Branch target

• Alu ve ladr komutları sonuç reg. bir sonraki komut için dallanma hedefi olmaz.

br r4add . . . • • •

ld r4, 4(r5)nopneg r6, r4

ld r0, 1000nopnopbr r0

not r0, r1nopbr r0

Page 65: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Instruction Level Parallelism

A pipeline that is full of useful instructions completes at most one every clock cycle

Sometimes called the Flynn limit If there are multiple function units and multiple instructions

have been fetched, then it is possible to start several at once

Two approaches are: superscalar Dynamically issue as many prefetched instructions to idle

function units as possible and Very Long Instruction Word (VLIW)

Statically compile long instruction words with many operations in a word, each for a different function unit

Word size may be 128 or 256 or more bits.

Page 66: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Komut Düzeyi Paralelliği

Komutlaral dolu bir pipeline her clok cycle da en fazla bir komut bitirir.

Sometimes called the Flynn limit Eğer multiple fonksiyon birimleri ve komutları fetch edildiyse,

bir kerede birden fazlasına başlanması mümkün olur. İki yaklaşım vardır: superscalar

Dynamically issue as many prefetched instructions to idle function units as possible

ve Very Long Instruction Word (VLIW) Statically compile long instruction words with many operations in

a word, each for a different function unit Word size may be 128 or 256 or more bits.

Page 67: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Character of the Function Units in Multiple Issue Machines

There may be different types of function units Floating point Integer Branch

There can be more than one of the same type Each function unit is itself pipelined Branches become more of a problem

There are fewer clock cycles between branches Branch units try to predict branch direction Instructions at branch target may be prefetched, and even

executed speculatively, in hopes the branch goes that way

Page 68: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Character of the Function Units in Multiple Issue Machines

Farklı fonksiyon brimleri vardır Floating point Integer Branch

Birden fazla aynı tip olabilir Her fonksiyon birimi kendinden pipeline edilmiştir Branch ler daha çok problem olurlar

Branchler arasında daha az clock cycle ları vardır Branch birimleri dallanma yönünü tahmin etmeye çalışırlar Dallanma hedefindeki komutlar prefetch edilebilr, ve kurgusal

olarak işlenebilirler

Page 69: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Figure 5.16: Structure of the Dual-Pipeline SRC

Page 70: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Figure 5.19: Dual-Issue SRC Pipelines and Forwarding Paths

Page 71: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Microprogramming: Basic Idea

Control unit job is to generate the sequence of control signals How about building a computer to do this?

Step Concrete RTN Control SequenceT0. MA PC: C PC+4; PCout, MAin, Inc4, Cin, ReadT1. MD M[MA]: PC C; Cout, PCin, WaitT2. IR MD; MDout, IRin

T3. A R[rb]; Grb, Rout, Ain

T4. C A + R[rc]; Grc, Rout, ADD, Cin

T5. R[ra] C; Cout, Gra, Rin, End

• Recall control sequence for 1-bus SRC

Page 72: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Microprogramming: Basic Idea

Kontrol biriminin görevi kontrol sinyalleri dizisinin oluşturulmasıdır Bunu yapacak bir bilğisayar nasıl yapılır?

Step Concrete RTN Control SequenceT0. MA PC: C PC+4; PCout, MAin, Inc4, Cin, ReadT1. MD M[MA]: PC C; Cout, PCin, WaitT2. IR MD; MDout, IRin

T3. A R[rb]; Grb, Rout, Ain

T4. C A + R[rc]; Grc, Rout, ADD, Cin

T5. R[ra] C; Cout, Gra, Rin, End

• Kontrol dizisini 1-bus SRC için yeniden çağıralım

Page 73: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

The Microcode Engine

A computer to generate control signals is much simpler than an ordinary computer

At the simplest, it just reads the control signals in order from a read only memory

The memory is called the control store A control store word, or microinstruction, contains a bit pattern

telling which control signals are true in a specific step The major issue is determining the order in which

microinstructions are read

Page 74: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

The Microcode Engine

Kontrol sinyalleri olşturan bir bilgisayar, normal bir bilgisayara göre daha basittir.

Basit olarak, sadece kontrol sinyallerini bellekten bir read ile okur.

Bellek, control store (kontrol deposu) olarak adlandırılır. control store word, veya microinstruction, belirli bir basamak için

kontrol sinyallerinin dogruluğunu söyleyen bit pattern leri içeriler Ana işlem microinstruction ların okunma sırasına kara

verilmesidir.

Page 75: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.22 Block Diagram of a Microcoded Control Unit

Microinstruction has branch control, branch address, and control signal fields

Micro-program counter can be set from several sources to do the required sequencing

Page 76: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.22 Block Diagram of a Microcoded Control Unit

Microinstruction; branch control, branch address, ve control signal alanlarına sahiptir.

Micro-program counter beklenen dizilemeyi yapamak için pekçok kaynaktan set edilebilir.

Page 77: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Parts of the Microprogrammed Control Unit

Since the control signals are just read from memory, the main function is sequencing

This is reflected in the several ways the PC can be loaded Output of incrementer—PC+1 PLA output—start address for a macroinstruction Branch address from instruction External source—say for exception or reset

Micro conditional branches can depend on condition codes, data path state, external signals, etc.

Page 78: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Microprogrammed Control Biriminin Parçaları

Kontrol sinyalleri sadece bellekten okunduğu için, ana fonksiyon bunların sıralanmasıdır

This is reflected in the several ways the PC can be loaded Output of incrementer—PC+1 PLA output—macroinstruction için başlangıç adresi instruction için branch adresi External source— exception ve reset için

Micro durumlu branch ler durum kodlarına, veri yolu durumuna , harici sinyalere...v.b. şeylere bağlıdır.

Page 79: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Contents of a Microinstruction

Main component is list of 1/0 control signal values There is a branch address in the control store There are branch control bits to determine when to use the

branch address and when to use PC+1

.

Branch control Branch addressControl signals

Microinstruction formatP C

o u tM A

i nP C

i nC

o u t Ai n

E n d

Page 80: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Microinstruction İçeriği

Ana component 1/0 kontrol sinyal değerleridir. control store da bir tane branch adresi vardır. PC+1 ve branch adreslerinin ne zaman kullanılacağına karar

vermeye yarayan branch kontrol bit leri vardır.

.

Branch control Branch addressControl signals

Microinstruction formatP C

o u tM A

i nP C

i nC

o u t Ai n

E n d

Page 81: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Figure 5.23: Layout of the Control Store

Microaddress

0

2n-1

Code for instruction fetch

Code for add

Code for br

Code for shr

a1

a2

a3

m bits wide

k branchcontrol bits

n branchaddr. bits

c controlsignals

Common inst. fetch sequence

Separate sequences for each (macro) instruction

Wide words

Page 82: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Figure 5.23: Control Store un Tasarımı

Microaddress

0

2n-1

Code for instruction fetch

Code for add

Code for br

Code for shr

a1

a2

a3

m bits wide

k branchcontrol bits

n branchaddr. bits

c controlsignals

Genel komut fetch dizisi

Her komut için ayrık diziler

Wide words

Page 83: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

In horizontal microcode, each control signal is represented by a bit in the instruction

In vertical microcode, a set of true control signals is represented by a shorter code

The name horizontal implies fewer control store words of more bits per word

Vertical code only allows RTs in a step for which there is a vertical instruction code

Thus vertical code may take more control store words of fewer bits

Horizontal Versus Vertical Microcode Schemes

Page 84: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

horizontal microcode da, instruction daki her kontrol sinyali bir bit ile ifade edilir

vertical microcode da, doğru kontrol sinyali kümesi daha kısa bir kod ile ifade edilir

horizontal ismi daha az kontrol deposu word lerinin word başına daha fazla bit ile ifade edilmesi anlamını taşır.

Vertical code sadece vertical instruction kodunda bir basamaktaki RT lere izin verir.

Böylece, vertical code daha az bit ile daha fazla kontrol deposu word ü alabilir.

Horizontal Versus Vertical Microcode Schemes

Page 85: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Fig 5.25 A Somewhat Vertical Encoding

Scheme would save (16+7) - (4+3) = 16 bits/word in the case illustrated

Page 86: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Saving Control Store Bits With Horizontal Microcode

Some control signals cannot possibly be true at the same time One and only one ALU function can be selected Only one register out gate can be true with a single bus Memory read and write cannot be true at the same step

A set of m such signals can be encoded using log2m bits (log2(m+1) to allow for no signal true)

The raw control signals can then be generated by a k to 2k decoder, where 2k ≥ m (or 2k ≥ m+1)

This is a compromise between horizontal and vertical encoding

Page 87: Chapter 5 Overview

S

2/e

C

DA

Computer Systems Design and Architecture Second Edition © 2004 Prentice Hall

Horizontal Microcode ile Kontrol Deposu Bit lerinin Korunması

Bazı kontrol sinyalleri muhtemel olarak aynı zamanda doğru olmayabilir.

One and only one ALU function can be selected Sadece bir out gate register doğru olabilir single bus ile. Bellek read ve write aynı basamakta doğru olmayabilirler.

m sinyal kümesi log2m kullanılarak encode yapılabilir. (log2(m+1) to allow for no signal true)

raw control sinyalleri k to 2k decoder ile oluşturulabilir,

2k ≥ m olmak şartıyla (veya 2k ≥ m+1) Bu vertical ve horizontal encode lama arasındaki uyumdur.


Recommended