CA Chap1 Nlt2013

CO&ISA, NLT 2013

1

Computer Organization and Instruction Set Architecture

Ngo Lam Trung Department of Computer Engineering

School of Information and Communication Technology (SoICT)

Hanoi University of Science and Technology

E-mail: [email protected]

CO&ISA, NLT 2013

2

Course administration

Instructor: Ngo Lam Trung 505 B1, SoICT, HUST

TA: Pham Ngoc Hung

Text: [Required] Computer Organization and Design, 4th edition revised printing Patterson & Hennessy 2012.

[Optional] Computer Organization and Architecture, 8th Edition, William Stalling

Slides: hard copy (how to print?) pdf on class website (URL TBA)

Schedule: as in timetable

CO&ISA, NLT 2013

3

Self introduction

Ngo Lam Trung

Current position

2004~: Lecturer, Department of Computer Engineering, School of Information and Communication Technology, Hanoi University of Science and Technology.

2012~: Visiting researcher, Shibaura Institute of Technology, Tokyo, Japan.

2013~: Head of Computer Systems Laboratory, School of Information and Communication Technology, Hanoi University of Science and Technology.

Educational background

MSc in Information Processing and Communication, Hanoi University of Technology, 2006.

PhD in Functional Control Systems, Shibaura Institute of Technology, Japan, 2012.

CO&ISA, NLT 2013

4

CA & ISA 2013-2014 in USTH

CA & ISA 2013-2014 characteristics

Class of approx. 80 students (big class)

Students are all freshmen (little background in ICT)

Students are from all 6 undergrad. programs (different majors)

The lack of traditional prerequisites: digital logic, HDL

CO&ISA, NLT 2013

5

Grading information

Grading criteria

Attendance/Attitude 10%

Practical exercise(s) 10%

Assignment(s) 10%

Mid-term test (Jan 14th) 20%

Final exam (TBA) 50%

Assignments (no late submission)

Dec 27th due Dec 29th

Jan 9th due Jan 12th

Jan 21th due Jan 24th

Other requirements

No music/video/surfing in class

Copy n paste from unreferenced sources

CO&ISA, NLT 2013

6

Why you need this course?

Of course, its in USTH curriculum!!!

More than that, this course is helpful for you

Software developer: to write better programs

Hardware designer: to make better computer

User: to know which computer is best suitable for your work

Expected outcomes: understanding of

Organization and architecture of modern computer.

Relationship between hardware and software.

Factors for computer performance evaluation.

CO&ISA, NLT 2013

7

Course content

Chapter 1: Introduction

1.1 Background

1.2 Computer Abstraction and Technology

1.3 Performance Evaluation

Chapter 2: Instruction Set Architecture

2.1 Overview

2.2 MIPS instruction set

2.3 MIPS organization

Practice and assignment 1

CO&ISA, NLT 2013

8

Course content

Chapter 3: Computer Arithmetic

3.1 Integer arithmetic

3.2 Floating point arithmetic

Chapter 4: CPU

4.1 Introduction

4.2 Simple CPU implementation

4.3 Enhancing performance with pipelining


CO&ISA, NLT 2013

9

Course content

Chapter 5: Memory

5.1 Memory hierarchy

5.2 Cache

5.3 Virtual memory

Chapter 6: I/O System


Textbook and references

1. [Textbook] Computer Organization and Design, 4th Edition, Patterson & Hennessy, MK Pub, 2008.

2. Computer Architecture: From Microprocessors to Supercomputers, Behrooz Parhami, Oxford Univ. Press, New York, 2005.

3. Computer Architecture and Organization, 7th Edition, William Stallings, Prentice Hall International, 2006.

Acknowledgement: This slide uses materials from three above books

CO&ISA, NLT 2013

10

Prerequisites - What You Should Know

Basic PC organization

How computer parts look like?

Basic logic design & machine organization

logical minimization, FSMs, component design

Create, compile, and run C/C++ programs

Create, run, debug programs in an assembly language

What if all you know is MS Word, VLC, NFS, FB?

Try harder!

CO&ISA, NLT 2013

11

Chapter 1: Introduction

1. Background: Basic logic design

2. Computer Abstraction and Technology

3. Performance Evaluation

CO&ISA, NLT 2013

12

1. Background: Basic logic design

Whats inside a computer?

What is the most basic building block of a computer?

The first basic thing: Computer systems are built based on digital logic circuit

CO&ISA, NLT 2013

13

Background: Basic logic design

Reviewing topics

Signal, logic operators, Boolean functions

Combinational circuits (memory-less)

Sequential circuits (with memory)

CO&ISA, NLT 2013

14

Signal, logic operators, Boolean function

Signal:

Physical: electric signal (usually voltage) run in electronic circuits

Logic: states of corresponding physical signal

Computer uses binary logic: 2 logic levels

Logical high: binary 1 (bit 1)

Logical low: binary 0 (bit 0)

Optional: high impedance state (output device is not controlled)

Example:

TTL: logical low = 0-0.8 V logical high = 2-5 V

CO&ISA, NLT 2013

15

Signal, logic operators, Boolean function

Logic gates: accept one or several logic input and produce one logic output

Some basic logic gates

CO&ISA, NLT 2013

16

Variation of logic gates

Gates with more than one inputs and inverted input, output

OR NOR NAND AND XNOR

Simplify logic circuit design

CO&ISA, NLT 2013

17

Array of logic gates and equivalent symbol

Equivalent single gate

symbol

CO&ISA, NLT 2013

18

Example

What do these gates do?

Enable = 0: Output = 0

Enable = 1: Output = Input

Controlled data transfer

Compl = 0: Output = Input

Compl = 1: Output = 11..11 Input

1s complement circuit

CO&ISA, NLT 2013

19

Combinational logic circuit

Represent logic functions, output only depends on current inputs

Ways to represent logic functions

Truth table: 2n rows, n is number of input signals

Logic expression

Logic circuit diagram

= = ( ) ( ) ( ) = ( ) ( ) ( )

= = + + = + +

Truth table

Logic expression

Exercise:

Draw the corresponding logic circuit diagram

CO&ISA, NLT 2013

20

Draw logic circuit to produce Y1, Y2, Y3

= = + + = + +

CO&ISA, NLT 2013

21

Basic rules

CO&ISA, NLT 2013

22

Function minimization

Minimize the below function

CO&ISA, NLT 2013

23

AB

00

01

11

10

CD

00 0 0 1 1

01 0 0 1 1

11 0 0 0 1

10 0 1 1 1

Function minimization: Karnaugh map

Construct the

Karnaugh map

Adjacent 1s are

grouped by

rectangle and

square boxed of

2n tiles

F=BCD+AB+AC

CO&ISA, NLT 2013

24

Useful combinational circuits

Arithmetic components: adder, multiplier, ALU (next chapters).

Multiplexer, demultiplexer, encoder, decoder

What does this circuit do?

CO&ISA, NLT 2013

25

Multiplexer

CO&ISA, NLT 2013

26

Decoder/demultiplexer

y 1

y 0

x 0

x 3

x 2

x 1

1

0

3

2

y 1

y 0

x 0

x 3

x 2

x 1 e

1

0

3

2

y 1

y 0

x 0

x 3

x 2

x 1

(a) 2-to-4 decoder (b) Decoder symbol

(c) Demultiplexer, or decoder with enable

(Enable) 1

1 0

1

1 0

1 1

1

CO&ISA, NLT 2013

27

Sequential circuits

Output depends on current inputs, AND previous output

Basic circuit: flip-flop, latch

Output Q=D

CO&ISA, NLT 2013

28

Flip-flop operation

Synchronized with external clock signal

D Q

QCLK(Input sampled on rising edge) (Input sampled on falling edge)

Sample timing diagram of positive edge triggered D flip-flop

CO&ISA, NLT 2013

29

Flip-flop operation

What if we have this

homework

CO&ISA, NLT 2013

30

Sequential circuits

Useful circuits: Register file, RAM, Flash

Decod

er

/ k

/ k

/

h

Write enable

Read address 0

Read address 1

Read data 0

Write data

Read enable

2 k -bit registers h / k

/ k

/ k

/ k

/ k

/ k

/ k

/ h

Write address

Muxes

Read data 1

/

k

/

h

/

h

/

h

/

k

/

h

Write enable

Read addr 0

/

k

/

k

Read addr 1

Write data Write addr

Read data 0

Read enable

Read data 1

(a) Register file with random access

(b) Graphic symbol for register file

Q C

Q

D

FF

/ k

Q C

Q

D

FF

Q C

Q

D

FF

Q C

Q

D

FF

/

k

Push

/

k

Input

Output Pop

Full

Empty

(c) FIFO symbol

CO&ISA, NLT 2013

31

Exercise

Design a 1-bit 4-to-1 multiplexer without enable signal, using basic logic gates

CO&ISA, NLT 2013

32

2. Computer Abstraction and Technology

Computer architecture: attributes of a system

Visible to a programmer

Or have a direct impact on the logical execution of a program.

Computer organization

The operational units and their interconnections that realize the architectural specifications.

include those hardware details transparent to the programmer,

- control signals;

- interfaces between the computer and peripherals;

- and the memory technology used.

Which is more hardware/software related?

CO&ISA, NLT 2013

34

Who we are?

What do you want to do in the future?

High-

level

view

Com

pute

r desig

ne

r

Circu

it d

esig

ne

r

Applic

ati

on d

esig

ne

r

Syste

m d

esig

ner

Log

ic d

esig

ne

r

Software

Hardware

Computer organization

Low-

level

view

Applic

ati

on d

om

ain

s

Ele

ctr

on

ic c

om

pon

ents

Computer architecture

Subfields or views in computer system engineering

CO&ISA, NLT 2013

35

Classes of Computers

Supercomputers

Super fast + expensive for high-end applications

Server computers

Network based

High capacity, performance, reliability

Range from small servers to building sized

Desktop computers

General purpose, variety of software

Subject to cost/performance tradeoff

Embedded computers

Hidden as components of systems

Stringent power/performance/cost constraints

CO&ISA, NLT 2013

36

Dominan look and feel of computer classes

Embedded

PC

Server

Super computer

CO&ISA, NLT 2013

37

Price/performance of computer classes

Embedded Personal

Workstation

Server

Mainframe

Super $Millions $100s Ks

$10s Ks

$1000s

$100s

$10s

Differences in scale,

not in substance

CO&ISA, NLT 2013

38

The Processor Market

embedded growth >> desktop growth

CO&ISA, NLT 2013

39

Generations of Progress

Generation

(begun)

Processor

technology

Memory

innovations

I/O devices

introduced

Dominant

look & fell

0 (1600s) (Electro-)

mechanical

Wheel, card Lever, dial,

punched card

Factory

equipment

1 (1950s) Vacuum tube Magnetic

drum

Paper tape,

magnetic tape

Hall-size

cabinet

2 (1960s) Transistor Magnetic

core

Drum, printer,

text terminal

Room-size

mainframe

3 (1970s) SSI/MSI RAM/ROM

chip

Disk, keyboard,

video monitor

Desk-size

mini

4 (1980s) LSI/VLSI SRAM/DRAM Network, CD,

mouse,sound

Desktop/

laptop micro

5 (1990s) ULSI/GSI/

WSI, SOC

SDRAM,

flash

Sensor/actuator,

point/click

Invisible,

embedded

[Ref. 2]

CO&ISA, NLT 2013

40

Technology Trends

Electronics technology continues to evolve

Increased capacity and performance

Reduced cost

Year Technology Relative performance/cost

1951 Vacuum tube 1

1965 Transistor 35

1975 Integrated circuit (IC) 900

1995 Very large scale IC (VLSI) 2,400,000

2005 Ultra large scale IC 6,200,000,000

DRAM capacity

[Textbook]

CO&ISA, NLT 2013

41

IC making

15-30 cm

30-60 cm

Silicon crystal ingot

Slicer

Processing: 20-30 steps

Blank wafer with defects

x x x x x x x

x x x x

0.2 cm

Patterned wafer

(100s of simple or scores of complex processors)

Dicer Die

~1 cm

Good die

~1 cm

Die tester

Microchip or other part

Mounting

Part tester

Usable part

to ship

The manufacturing process for an IC part

CO&ISA, NLT 2013

42

Video: How an IC is made

CO&ISA, NLT 2013

43

Moores Law

1Mb

1990 1980 2000 2010

kIPS

MIPS

GIPS

TIPS P

rocessor

pe

rfo

rma

nce

Calendar year

80286 68000

80386

80486

68040

Pentium

Pentium II

R10000

1.6 / yr

10 / 5 yrs

2 / 18 mos

64Mb

4Mb

64kb

256kb

256Mb

1Gb

16Mb

4 / 3 yrs

Processor

Memory

kb

Mb

Gb

Tb

Me

mo

ry c

hip

cap

acity

How do we benefit from this?

CO&ISA, NLT 2013

44

Courtesy, Intel

Dual Core

Itanium with

1.7B

transistors

feature size

&

die size

CO&ISA, NLT 2013

45

Cost/performance

1980 1960 2000 2020 $1

Co

mp

ute

r co

st

Calendar year

$1 K

$1 M

$1 G

CO&ISA, NLT 2013

46

Homework

Why CPU speed does not exceed 3.5-4GHz?

CO&ISA, NLT 2013

47

Computer Organization

Five classic components of a computer input, output, memory, datapath, and control

datapath + control = processor (CPU)

CO&ISA, NLT 2013

48

A similar view

Usually, the Link unit is hidden

Memory

Link Input/Output

To/from network

Processor

Control

Datapath

Input

Output

CPU I/O

CO&ISA, NLT 2013

49

Opening the box: anatomy of computer

CO&ISA, NLT 2013

50

Opening the box: anatomy of computer

The story of each component

worth a separate course!

CO&ISA, NLT 2013

51

Inside the Processor (CPU)

Datapath: performs operations on data

Control: sequences datapath, memory, ...

Cache memory

Small fast SRAM memory for immediate access to data

CO&ISA, NLT 2013

52

Inside the Processor

AMD Barcelona: 4 processor cores

CO&ISA, NLT 2013

53 http://www.techwarelabs.com/reviews/processors/barcelona/

Four out-of-order cores on one chip

1.9 GHz clock rate

65nm technology

Three levels of caches (L1, L2, L3) on chip

Integrated Northbridge

AMDs Barcelona Multicore Chip

Core 1 Core 2

Core 3 Core 4

Northbridge

512K

B L

2

512K

B L

2

51

2K

B L

2

51

2K

B L

2

2M

B s

hare

d L

3 C

ac

he

CO&ISA, NLT 2013

54

Hardware/software interface: below your program

Application software

Written in high-level language (HLL)

System software

Compiler: translates HLL code to machine code

Operating System: service code

- Handling input/output

- Managing memory and storage

- Scheduling tasks & sharing resources

Hardware

Processor, memory, I/O controllers

CO&ISA, NLT 2013

55

Below your program

High-level language program (in C) swap (int v[], int k) (int temp;

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

)

Assembly language program (for MIPS) swap: sll $2, $5, 2

add $2, $4, $2

lw $15, 0($2)

lw $16, 4($2)

sw $16, 0($2)

sw $15, 4($2)

jr $31

Machine (object, binary) code (for MIPS) 000000 00000 00101 0001000010000000

000000 00100 00010 0001000000100000

. . .

C compiler

assembler

one-to-many

one-to-one

CO&ISA, NLT 2013

56

Levels of Program Code

High-level language

Level of abstraction closer to problem domain

Provides for productivity and portability

Assembly language

Textual representation of instructions

Hardware representation

Binary digits (bits)

Encoded instructions and data

CO&ISA, NLT 2013

57

Advantages of Higher-Level Languages ?

Higher-level languages

As a result, very little programming is done today at the assembler level

Allow the programmer to think in a more natural language and for their intended use (Fortran for scientific computation, Cobol for business programming, Lisp for symbol manipulation, Java for web programming, )

Improve programmer productivity more understandable code that is easier to debug and validate

Improve program maintainability

Allow programs to be independent of the computer on which they are developed (compilers and assemblers can translate high-level language programs to the binary instructions of any machine)

Emergence of optimizing compilers that produce very efficient assembly code optimized for the target machine

CO&ISA, NLT 2013

58

Problem

Computer organization evolves fast

New processor

New hardware

every year/day/month

How a program/software can run on different computers?

Instruction Set Architecture

Abstraction

CO&ISA, NLT 2013

60

Computer performance

To maximize performance, need to minimize execution time

performanceX = 1 / execution_timeX

If computer X is n times faster than Y, then

performanceX execution_timeY -------------------- = --------------------- = n

performanceY execution_timeX

CO&ISA, NLT 2013

61

Relative Performance Example

If computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds, how much faster is A than B?

We know that A is n times faster than B if

performanceA execution_timeB -------------------- = --------------------- = n

performanceB execution_timeA

15 ------ = 1.5

10

The performance ratio is

So A is 1.5 times faster than B

CO&ISA, NLT 2013

62

Performance Factors

CPU execution time (CPU time) time the CPU spends working on a task

Does not include time waiting for I/O or running other programs

CPU execution time # CPU clock cycles

for a program for a program = x clock cycle time

# CPU clock cycles for a program

clock rate = -------------------------------------------

Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program

CO&ISA, NLT 2013

63

Review: Machine Clock Rate

Clock rate (clock cycles per second in MHz or GHz) is inverse of clock cycle time (clock period)

CC = 1 / CR

one clock period

10 nsec clock cycle => 100 MHz clock rate



1 nsec (10-9) clock cycle => 1 GHz (109) clock rate

500 psec clock cycle => 2 GHz clock rate



CO&ISA, NLT 2013

64

Improving Performance Example

A program runs on computer A with a 2 GHz clock in 10 seconds. What clock rate must computer B run at to run this program in 6 seconds? Assume that, computer B will require 1.2 times as many clock cycles as computer A to run the program.

CPU timeA CPU clock cyclesA clock rateA = -------------------------------

CPU clock cyclesA = 10 sec x 2 x 109 cycles/sec

= 20 x 109 cycles

CPU timeB 1.2 x 20 x 109 cycles

clock rateB = -------------------------------

clock rateB 1.2 x 20 x 109 cycles

6 seconds = ------------------------------- = 4 GHz

CO&ISA, NLT 2013

65

Clock Cycles per Instruction

Not all instructions take the same amount of time to execute

Average execution time ~ average clock cycles per instruction

Clock cycles per instruction (CPI) the average number of clock cycles each instruction takes to execute

A way to compare two different implementations of the same ISA

# CPU clock cycles # Instructions Average clock cycles

for a program for a program per instruction = x

CPI for this instruction class

A B C

CPI 1 2 3

CO&ISA, NLT 2013

66

Using the Performance Equation

Computers A and B implement the same ISA. Computer A has a clock cycle time of 250 ps and an effective CPI of 2.0 for some program and computer B has a clock cycle time of 500 ps and an effective CPI of 1.2 for the same program. Which computer is faster and by how much?

Each computer executes the same number of instructions, I, so

CPU timeA = I x 2.0 x 250 ps = 500 x I ps

CPU timeB = I x 1.2 x 500 ps = 600 x I ps

Clearly, A is faster by the ratio of execution times

performanceA execution_timeB 600 x I ps ------------------- = --------------------- = ---------------- = 1.2

performanceB execution_timeA 500 x I ps

CO&ISA, NLT 2013

67

The Performance Equation

Our basic performance equation is then calculated

CPU time = Instruction_count x CPI x clock_cycle

Instruction_count x CPI

clock_rate = -----------------------------------------------

Key factors that affect performance (CPU execution time)

The clock rate: CPU specification

CPI: varies by instruction type and ISA implementation

Instruction count: measure by using profilers/ simulators

CO&ISA, NLT 2013

68

Improving performance by CPI

How much faster would the machine be if a better data cache reduced the average load time to 2 cycles?

What if branch instruction is only one cycle?

What if two ALU instructions could be executed at once?

Op Freq CPIi Freq x CPIi

ALU 50% 1

Load 20% 5

Store 10% 3

Branch 20% 2

= =

.5

1.0

.3

.4

2.2

CPU time new = 1.6 x IC x CC so 2.2/1.6 means 37.5% faster

1.6

.5

.4

.3

.4

.5

1.0

.3

.2

2.0

CPU time new = 2.0 x IC x CC so 2.2/2.0 means 10% faster

.25

1.0

.3

.4

1.95

CPU time new = 1.95 x IC x CC so 2.2/1.95 means 12.8% faster

CO&ISA, NLT 2013

69

Dynamic Instruction Count

250 instructions

for i = 1, 100 do

20 instructions

for j = 1, 100 do

40 instructions

for k = 1, 100 do

10 instructions

endfor

endfor

endfor

How many

instructions are

executed in this

program fragment?

Each for consists of two instructions: increment index,

check exit condition

2 + 40 + 1200 instructions

100 iterations

124,200 instructions in all

2 + 10 instructions

100 iterations

1200 instructions in

all

2 + 20 + 124,200 instructions

100 iterations

12,422,200 instructions in all

12,422,450 Instructions

for i = 1, n

while x > 0

Static count = 326

CO&ISA, NLT 2013

70

How to improve performance?

Shorter clock cycle = faster clock rate

latest CPU technology

Smaller CPI

optimizing Instruction Set Architecture

Smaller instruction count

optimizing algorithm and compiler

To get best performance, multiple criteria are combined and considered at design time

specific CPU for specific class computation problem

CO&ISA, NLT 2013

71

Faster steps do not necessarily mean

shorter travel time.

Faster Clock Shorter Running Time

1 GHz

2 GHz

4 steps

Solution

20 steps

Suppose addition takes 1 ns

Clock period = 1 ns; 1 cycle

Clock period = ns; 2 cycles

In this example, addition time

does not improve in going from

1 GHz to 2 GHz clock

CO&ISA, NLT 2013

72

Review

Digital logic design

Computer technology

Computer performance evaluation

Next week: MIPSs instruction set

Date post:	19-Oct-2015
Category:	Documents
Upload:	proturk2
View:	24 times
Download:	0 times

CA Chap1 Nlt2013

Documents