Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | kenyon-anthony |
View: | 25 times |
Download: | 1 times |
+ CS 325: CS Hardware and SoftwareOrganization and Architecture
Computer Evolution and Performance 2
+Outline
Von Neumann Architecture
Processor Hierarchy
Registers
ALU
Processor Categories
Processor Performance
Amdahl’s Law
Computer Benchmarks
+Von Neumann Architecture
Characteristic of most modern processors.
Central idea is Stored Program.
Three basic components: Processor Memory I/O Facilities
+Illustration of Von Neumann Architecture
+Processor
Digital Device.
Performs computation involving multiple steps.
Building blocks used to form computer system.
+Hierarchical Structure and Computational Engines
Most computer architecture follows a hierarchical approach.
Subparts of a large, central processor are sophisticated enough to meet our definition of a processor.
Some engineers use the term computational engine for sub-piece that is less powerful than the main processor.
+Illustration of Processor Hierarchy
+Major Components of a Conventional Processor
Controller
Computational Engine (ALU)
Local Data Storage
Internal Interconnections
External Interface
+Illustration of a Conventional Processor
+Parts of a Conventional Processor
Controller Overall responsibility for execution Moves through sequence of steps Coordinates other units
Computational Engine Operates as directed by controller Typically provides arithmetic and Boolean operations
(ALU) Performs one operation at a time
+Parts of a Conventional Processor
Local Data Storage Holds data values for operations Must be loaded before operation can be performed Typically implemented with registers
Internal Interconnections Allows transfer of values among units of the
processor Sometimes called data path
+Parts of a Conventional Processor
External Interface Handles communication between processor and rest
of computer system Provides connections to external memory as well as
external I/O devices
+Another Illustration of Processor
+Parts of a Conventional Processor
ALU Status Flags:
Neg, Zero, Carry, Overflow Shifter:
Left multiplication by 2 Right division by 2
Complementer: Logical NOT
+Example Register Organizations
+Processor Registers
Motorola CPU - MC68000 8 32-bit general purpose registers (D0 – D7) 8 32-bit address registers (A0 – A7) 1 32-bit program counter 1 16 status register
+Processor Registers
Intel 8086 – 16-bit
General Purpose: AX – Accumulator: Multiply, Divide, I/O BX – Base: Pointer to base address
(data) CX – Count: Counter for loops, shifts DX – Data: Multiply, Divide, I/O
Pointer and Index: SP – Stack Pointer: pointer to top of
stack BP – Base Pointer: pointer to base
address (stack) SI – Source Index: source string/index
pointer DI – Destination Index: Destination
string/index pointer
Segment Registers: CS – Code Segment DS – Data Segment SS – Stack Segment ES – Extra Segment
Program Status: PC – Program Counter SR – Status Register
+Processor Registers
Intel 80386 – Pentium 2 Similar to 8086, but register width doubled to 32-bit
+Arithmetic Logic Unit (ALU)
Main computational engine in conventional processor.
Complex unit that can perform variety of tasks Integer arithmetic (add, subtract, multiply, divide) Shift (left, right, circular) Boolean (AND, OR, NOT, XOR)
Typically CPU “bit size” refers to ALU and register size 32-bit CPU 32-bit ALU and registers 64-bit CPU 64-bit ALU and registers
+Processor Categories and Roles
Many possible roles for individual processors in: Coprocessors Microcontrollers Microsequencers Embedded system processors General purpose processors
+Coprocessor
Operates in conjunction with and under the control of another processor. Special purpose processor Performs a single task Operates at high speed
Example: Math Coprocessor
Used for floating point mathematical operations
+Microcontroller
Programmable device
Dedicated to control of a physical system
Example: ECU for automobile engine Roadway intersection traffic lights
+Microsequencer
Similar to microcontroller
Controls coprocessors and other engines within a large processor
Example: Move operands to floating point unit Invoke an operation (divide) Move result back to memory
+Embedded System Processor
Operates sophisticated electronic device
Usually more powerful than microcontroller
Example: Controlling a DVD player, including commands from a
remote control
+General Purpose Processor
Most powerful type of processor
Completely programmable
Full functionality
Example: CPU in personal computer/laptop (CISC x86 architecture) CPU in smartphone/tablet (RISC ARM architecture)
+
Processor Performance
+Clock and Instruction Rate
Clock Cycle Time interval in which all basic circuits (steps) inside a process must
complete Time at which gates are clocked (gate-signal propagation)
Clock Rate 1/clock cycle (GHz – billion cycles per second)
Instruction Rate Measure of time required to execute instructions
MIPS – million instructions per second Varies since some instructions take more time (more clock cycles)
than others Shift left instruction vs. fetch from memory instruction
+Basic Performance Equation
Define: N = Number of instructions executed in the
program
S = Average number of cycles for
instructions in the program
R = Clock rate
T = Program execution time
T = N * S
R
+Improve PerformanceTo improve performance:
Decrease N and/or S Increase R
Parameters are not independent: Increasing R may increase S as well
N is primarily controlled by compiler
Processors with large R may not have the best performance Due to larger S
Making logic circuits faster/smaller is a definite win Increases R while S and N remain unchanged
+Amdahl’s Law
Potential speed up of program using multiple processors.
Concluded that: Code needs to be parallelizable Speed up is bound, giving diminishing returns for more
processors
Task dependent Servers gain by maintaining multiple connections on
multiple processors Databases can be split into parallel tasks
+Amdahl’s Law
Most important principle in computer design: Make the common case fast
Optimize for the normal case
Enhancement: any change/modification in the design of a component
Speedup: how much faster a task will execute using an enhanced component versus using the original component.
Speedup = Componentenhanced
Componentoriginal
+Amdahl’s Law
The enhanced feature may not be used all the time. Let the fraction of the computation time when the enhanced
feature is used be F.
Let the speedup when the enhanced feature is used be Se.
Now the execution time with the enhancement is:
Exnew = Exold * (1 – F) + Exold * (F/Se)
This gives the overall speedup (So) as:
So = Exold/Exnew = 1 / ((1 - F) + (F/Se))
+Amdahl’s Law – Example 1
Suppose that we are considering an enhancement that runs 10 times faster than the original component but is usable only 40% of the time. What is the overall speedup gained by incorporating the enhancement?
Se = 10
F = 40 / 100 = 0.4
So = 1 / ((1 – F) + (F / Se))
= 1 / (0.6 + (0.4 / 10))
= 1 / 0.64
= 1.56
+Amdahl’s Law – Example 2
Suppose that we hired a guru programmer that made 70% of our program run 15x faster that the original program. What is the speedup of the enhanced program?
Se = 15
F = 70 / 100 = 0.7
So = 1 / ((1 – F) + (F / Se))
= 1 / (0.3 + (0.7 / 15))
= 1 / 0.347
= 2.88
+Amdahl’s Law – Example 3
Suppose that we hired two students to enhance our WKU web Server performance. The first student increased the performance of the server by 12% for 85% of the time. The second student increased the performance of the server by 2x for 25% of the time. Which student produced the overall highest speedup?
Student1 Student2
Se = 1.12 Se = 2
F = 85 / 100 = 0.85 F = 25 / 100 = 0.25
So = 1 / ((1 – F) + (F / Se)) So = 1 / ((1 – F) + (F / Se))
= 1 / (0.15 + (0.85 / 1.12)) = 1 / (0.75 + (0.25 / 2))
= 1 / 0.909 = 1 / 0.875
= 1.1 = 1.14
+Benchmarks
LINPACK (Scientific Computing) Speed in solving linear system of equations (matrix multiplications) http://www.top500.org/list/2013/11/
+Top 10 Supercomputers
+Top 500 Performance Development
+Benchmarks - LINPACK
Current fastest supercomputer: Tianhe-2 (MiklyWay-2)
3.12 million cores @ 2.2Ghz 33.86 Pflops/sec = 33,860,000,000,000,000 Floating point
operations/sec
Current High End Desktop: Intel I7 “Haswell” 4770k 4 cores @ 3.5Ghz 177 Gflops/sec = 177,000,000,000 Floating point operations/sec
Current Google Android Smartphone: Google Nexus 5 4 cores @ 2.3Ghz ARM RISC Architecture 393 Mflops/sec = 393,000,000 Floating point operations/sec