+ All Categories
Home > Documents > CS 325: CS Hardware and Software Organization and Architecture

CS 325: CS Hardware and Software Organization and Architecture

Date post: 03-Jan-2016
Category:
Upload: kenyon-anthony
View: 25 times
Download: 1 times
Share this document with a friend
Description:
CS 325: CS Hardware and Software Organization and Architecture. Computer Evolution and Performance 2. Outline. Von Neumann Architecture Processor Hierarchy Registers ALU Processor Categories Processor Performance Amdahl’s Law Computer Benchmarks. Von Neumann Architecture. - PowerPoint PPT Presentation
39
+ CS 325: CS Hardware and Software Organization and Architecture Computer Evolution and Performance 2
Transcript
Page 1: CS 325:  CS Hardware and Software Organization and Architecture

+ CS 325: CS Hardware and SoftwareOrganization and Architecture

Computer Evolution and Performance 2

Page 2: CS 325:  CS Hardware and Software Organization and Architecture

+Outline

Von Neumann Architecture

Processor Hierarchy

Registers

ALU

Processor Categories

Processor Performance

Amdahl’s Law

Computer Benchmarks

Page 3: CS 325:  CS Hardware and Software Organization and Architecture

+Von Neumann Architecture

Characteristic of most modern processors.

Central idea is Stored Program.

Three basic components: Processor Memory I/O Facilities

Page 4: CS 325:  CS Hardware and Software Organization and Architecture

+Illustration of Von Neumann Architecture

Page 5: CS 325:  CS Hardware and Software Organization and Architecture

+Processor

Digital Device.

Performs computation involving multiple steps.

Building blocks used to form computer system.

Page 6: CS 325:  CS Hardware and Software Organization and Architecture

+Hierarchical Structure and Computational Engines

Most computer architecture follows a hierarchical approach.

Subparts of a large, central processor are sophisticated enough to meet our definition of a processor.

Some engineers use the term computational engine for sub-piece that is less powerful than the main processor.

Page 7: CS 325:  CS Hardware and Software Organization and Architecture

+Illustration of Processor Hierarchy

Page 8: CS 325:  CS Hardware and Software Organization and Architecture

+Major Components of a Conventional Processor

Controller

Computational Engine (ALU)

Local Data Storage

Internal Interconnections

External Interface

Page 9: CS 325:  CS Hardware and Software Organization and Architecture

+Illustration of a Conventional Processor

Page 10: CS 325:  CS Hardware and Software Organization and Architecture

+Parts of a Conventional Processor

Controller Overall responsibility for execution Moves through sequence of steps Coordinates other units

Computational Engine Operates as directed by controller Typically provides arithmetic and Boolean operations

(ALU) Performs one operation at a time

Page 11: CS 325:  CS Hardware and Software Organization and Architecture

+Parts of a Conventional Processor

Local Data Storage Holds data values for operations Must be loaded before operation can be performed Typically implemented with registers

Internal Interconnections Allows transfer of values among units of the

processor Sometimes called data path

Page 12: CS 325:  CS Hardware and Software Organization and Architecture

+Parts of a Conventional Processor

External Interface Handles communication between processor and rest

of computer system Provides connections to external memory as well as

external I/O devices

Page 13: CS 325:  CS Hardware and Software Organization and Architecture

+Another Illustration of Processor

Page 14: CS 325:  CS Hardware and Software Organization and Architecture

+Parts of a Conventional Processor

ALU Status Flags:

Neg, Zero, Carry, Overflow Shifter:

Left multiplication by 2 Right division by 2

Complementer: Logical NOT

Page 15: CS 325:  CS Hardware and Software Organization and Architecture

+Example Register Organizations

Page 16: CS 325:  CS Hardware and Software Organization and Architecture

+Processor Registers

Motorola CPU - MC68000 8 32-bit general purpose registers (D0 – D7) 8 32-bit address registers (A0 – A7) 1 32-bit program counter 1 16 status register

Page 17: CS 325:  CS Hardware and Software Organization and Architecture

+Processor Registers

Intel 8086 – 16-bit

General Purpose: AX – Accumulator: Multiply, Divide, I/O BX – Base: Pointer to base address

(data) CX – Count: Counter for loops, shifts DX – Data: Multiply, Divide, I/O

Pointer and Index: SP – Stack Pointer: pointer to top of

stack BP – Base Pointer: pointer to base

address (stack) SI – Source Index: source string/index

pointer DI – Destination Index: Destination

string/index pointer

Segment Registers: CS – Code Segment DS – Data Segment SS – Stack Segment ES – Extra Segment

Program Status: PC – Program Counter SR – Status Register

Page 18: CS 325:  CS Hardware and Software Organization and Architecture

+Processor Registers

Intel 80386 – Pentium 2 Similar to 8086, but register width doubled to 32-bit

Page 19: CS 325:  CS Hardware and Software Organization and Architecture

+Arithmetic Logic Unit (ALU)

Main computational engine in conventional processor.

Complex unit that can perform variety of tasks Integer arithmetic (add, subtract, multiply, divide) Shift (left, right, circular) Boolean (AND, OR, NOT, XOR)

Typically CPU “bit size” refers to ALU and register size 32-bit CPU 32-bit ALU and registers 64-bit CPU 64-bit ALU and registers

Page 20: CS 325:  CS Hardware and Software Organization and Architecture

+Processor Categories and Roles

Many possible roles for individual processors in: Coprocessors Microcontrollers Microsequencers Embedded system processors General purpose processors

Page 21: CS 325:  CS Hardware and Software Organization and Architecture

+Coprocessor

Operates in conjunction with and under the control of another processor. Special purpose processor Performs a single task Operates at high speed

Example: Math Coprocessor

Used for floating point mathematical operations

Page 22: CS 325:  CS Hardware and Software Organization and Architecture

+Microcontroller

Programmable device

Dedicated to control of a physical system

Example: ECU for automobile engine Roadway intersection traffic lights

Page 23: CS 325:  CS Hardware and Software Organization and Architecture

+Microsequencer

Similar to microcontroller

Controls coprocessors and other engines within a large processor

Example: Move operands to floating point unit Invoke an operation (divide) Move result back to memory

Page 24: CS 325:  CS Hardware and Software Organization and Architecture

+Embedded System Processor

Operates sophisticated electronic device

Usually more powerful than microcontroller

Example: Controlling a DVD player, including commands from a

remote control

Page 25: CS 325:  CS Hardware and Software Organization and Architecture

+General Purpose Processor

Most powerful type of processor

Completely programmable

Full functionality

Example: CPU in personal computer/laptop (CISC x86 architecture) CPU in smartphone/tablet (RISC ARM architecture)

Page 26: CS 325:  CS Hardware and Software Organization and Architecture

+

Processor Performance

Page 27: CS 325:  CS Hardware and Software Organization and Architecture

+Clock and Instruction Rate

Clock Cycle Time interval in which all basic circuits (steps) inside a process must

complete Time at which gates are clocked (gate-signal propagation)

Clock Rate 1/clock cycle (GHz – billion cycles per second)

Instruction Rate Measure of time required to execute instructions

MIPS – million instructions per second Varies since some instructions take more time (more clock cycles)

than others Shift left instruction vs. fetch from memory instruction

Page 28: CS 325:  CS Hardware and Software Organization and Architecture

+Basic Performance Equation

Define: N = Number of instructions executed in the

program

S = Average number of cycles for

instructions in the program

R = Clock rate

T = Program execution time

T = N * S

R

Page 29: CS 325:  CS Hardware and Software Organization and Architecture

+Improve PerformanceTo improve performance:

Decrease N and/or S Increase R

Parameters are not independent: Increasing R may increase S as well

N is primarily controlled by compiler

Processors with large R may not have the best performance Due to larger S

Making logic circuits faster/smaller is a definite win Increases R while S and N remain unchanged

Page 30: CS 325:  CS Hardware and Software Organization and Architecture

+Amdahl’s Law

Potential speed up of program using multiple processors.

Concluded that: Code needs to be parallelizable Speed up is bound, giving diminishing returns for more

processors

Task dependent Servers gain by maintaining multiple connections on

multiple processors Databases can be split into parallel tasks

Page 31: CS 325:  CS Hardware and Software Organization and Architecture

+Amdahl’s Law

Most important principle in computer design: Make the common case fast

Optimize for the normal case

Enhancement: any change/modification in the design of a component

Speedup: how much faster a task will execute using an enhanced component versus using the original component.

Speedup = Componentenhanced

Componentoriginal

Page 32: CS 325:  CS Hardware and Software Organization and Architecture

+Amdahl’s Law

The enhanced feature may not be used all the time. Let the fraction of the computation time when the enhanced

feature is used be F.

Let the speedup when the enhanced feature is used be Se.

Now the execution time with the enhancement is:

Exnew = Exold * (1 – F) + Exold * (F/Se)

This gives the overall speedup (So) as:

So = Exold/Exnew = 1 / ((1 - F) + (F/Se))

Page 33: CS 325:  CS Hardware and Software Organization and Architecture

+Amdahl’s Law – Example 1

Suppose that we are considering an enhancement that runs 10 times faster than the original component but is usable only 40% of the time. What is the overall speedup gained by incorporating the enhancement?

Se = 10

F = 40 / 100 = 0.4

So = 1 / ((1 – F) + (F / Se))

= 1 / (0.6 + (0.4 / 10))

= 1 / 0.64

= 1.56

Page 34: CS 325:  CS Hardware and Software Organization and Architecture

+Amdahl’s Law – Example 2

Suppose that we hired a guru programmer that made 70% of our program run 15x faster that the original program. What is the speedup of the enhanced program?

Se = 15

F = 70 / 100 = 0.7

So = 1 / ((1 – F) + (F / Se))

= 1 / (0.3 + (0.7 / 15))

= 1 / 0.347

= 2.88

Page 35: CS 325:  CS Hardware and Software Organization and Architecture

+Amdahl’s Law – Example 3

Suppose that we hired two students to enhance our WKU web Server performance. The first student increased the performance of the server by 12% for 85% of the time. The second student increased the performance of the server by 2x for 25% of the time. Which student produced the overall highest speedup?

Student1 Student2

Se = 1.12 Se = 2

F = 85 / 100 = 0.85 F = 25 / 100 = 0.25

So = 1 / ((1 – F) + (F / Se)) So = 1 / ((1 – F) + (F / Se))

= 1 / (0.15 + (0.85 / 1.12)) = 1 / (0.75 + (0.25 / 2))

= 1 / 0.909 = 1 / 0.875

= 1.1 = 1.14

Page 36: CS 325:  CS Hardware and Software Organization and Architecture

+Benchmarks

LINPACK (Scientific Computing) Speed in solving linear system of equations (matrix multiplications) http://www.top500.org/list/2013/11/

Page 37: CS 325:  CS Hardware and Software Organization and Architecture

+Top 10 Supercomputers

Page 38: CS 325:  CS Hardware and Software Organization and Architecture

+Top 500 Performance Development

Page 39: CS 325:  CS Hardware and Software Organization and Architecture

+Benchmarks - LINPACK

Current fastest supercomputer: Tianhe-2 (MiklyWay-2)

3.12 million cores @ 2.2Ghz 33.86 Pflops/sec = 33,860,000,000,000,000 Floating point

operations/sec

Current High End Desktop: Intel I7 “Haswell” 4770k 4 cores @ 3.5Ghz 177 Gflops/sec = 177,000,000,000 Floating point operations/sec

Current Google Android Smartphone: Google Nexus 5 4 cores @ 2.3Ghz ARM RISC Architecture 393 Mflops/sec = 393,000,000 Floating point operations/sec


Recommended