1
• What are the basic gates?
• What are two important uses of XOR gates in
computers?
• What are “universal” gates? Give two examples.
QUIZ Chapter 3
2
• What are the basic gates?
NOT AND NAND OR NOR XOR XNOR
• What are two important uses of XOR gates in
computers?
Parity generators+checkers, adders
• What are “universal” gates? Give two examples.
NAND NOR
QUIZ Chapter 3
3
• What are the basic flip-flops?
• What are three important uses of flip-flops in
computers?
QUIZ Chapter 3
4
• What are the basic flip-flops?
SR D JK T
• What are three important uses of flip-flops in
computers?
– Registers
– Memories
– Counters
– Convolutional coders+decoders
QUIZ Chapter 3
5
• What are convolutional codes and when/where are
they used?
• What is the meaning of “Maximum Likelihood” in
a convolutional decoder?
• What is the meaning of “Partial Response” in the
PRML code?
QUIZ Chapter 3
6
• What are convolutional codes and when/where are
they used?
Current output depends on current input and past
inputs. Real-time streams of data.
• What is the meaning of “Maximum Likelihood” in
a convolutional decoder?
The sequence with the minimum number of bits in
error is the most likely.
• What is the meaning of “Partial Response” in the
PRML code?
The ML sequence is computed based on a number
of bits following the bit in error.
7
•F(11 01 01 00 11 11 11 10) =
Draw the trellis
diagram and work the
ML sequence for the
error below.
(We still start from
state 0 and use a
sequence length of
three pairs)
QUIZ: Trellis decoding
8
Chapter 4
MARIE: An Introduction
to a Simple Computer
10
4.2 CPU Basics – the “fetch-execute” cycle
The computer’s CPU fetches, decodes and executes
instructions. Some instructions also cause the CPU to get
or put data from/into memory.
11
4.2 CPU Basics
The two main parts of the CPU are:
• The datapath → ALU and registers,
interconnected by a data bus that is also
connected to the main memory.
• The its control unit → provides control
signals that provide sequenced operations in
the datapath.
12
• Hold data that can be readily accessed by the CPU.
• Can be implemented using D flip-flops.
– A 32-bit register requires 32 D flip-flops.
• Can store one of 3 types of information: data,
addresses, control. Some registers are general-
purpose, others special purpose (reserved for only
data, addr., or ctrl.)
– The control unit determines which actions to carry out
according to the values in an Instruction Register and
a Status register.
– The Program Counter register can hold only …
Registers
13
Carries out logical and arithmetic operations (as
directed by the control unit).
ALU
1-bit ALU that can perform AND, OR, ADD, SUBTRACT Source: Computer Organization and Design – Patterson&Hennessey – 3rd ed.
Not in text
14
Show all connections necessary to build a 2-bit ALU
QUIZ: ALU
Bit 1 Bit 0
Not in text
15
Show the values of all control signals needed for
Result to be a – b
QUIZ: ALU Not in text
16
Determines which actions the CPU is carrying out,
according to the values in:
• IR (Instruction Register)
• PC register
• status register
Control Unit
17
What are the values required for P0 … P5 in order
for data to be written to the 16-bit register?
QUIZ: Control Unit
18
4.3 The Bus
• The CPU shares data with other system components
by way of a data bus.
– A bus is a set of wires that simultaneously convey a single
bit along each line.
• Two types of buses are commonly found in computer
systems: point-to-point, and multipoint buses.
19
Buses consist of these types of lines:
• data lines convey bits from one device to another
• control lines determine the direction of data flow,
and when each device can access the bus
• address lines determine the location of the source
or destination of the data
• power lines
Types of buses by function
20
• Processor-to-memory: short, high-speed, dedicated
(optimized for a particular CPU and memory) e.g. FSB, BSB –
see later)
• I/O: longer, general-purpose (can accommodate multiple I/O
devices) e.g. PATA, SATA, SCSI)
• Backplane: physically built into the computer
chassis/motherboard, connects multiple cards, e.g. ISA, EISA,
PCI, PCI-E
21
Because a multipoint bus is a shared resource,
access to it is controlled through protocols (built
into the hardware) that reduce or eliminate
collisions.
Multipoint bus
22
– Distributed using self-detection:
Devices decide which gets the bus
among themselves.
– Distributed using collision-
detection: Any device can try to
use the bus. If its data collides
with the data of another device,
it tries again. NON-DETERMINISTIC!
– Daisy chain: Permissions
are passed from the highest-
priority device to the
lowest.
– Centralized parallel: Each
device is directly connected
to an arbitration circuit.
• In a master-slave configuration, where more than
one device can be the bus master, concurrent bus
master requests must be arbitrated.
• Four categories of bus arbitration are:
Multipoint bus
Real-life buses
Motherboard
diagram, cca. 2007
23 Source: http://en.wikipedia.org/wiki/Front-side_bus
Not in text
Real-life buses: Front-side and back-side
Source: http://en.wikipedia.org/wiki/Front-side_bus
Not in text
Real-life buses
Intel QPI (Quick
Path Interconnect),
2009
25 Source: Dr. Dobb's Journal Apr 06, 2009. URL: http://www.ddj.com/216402907
Not in text
26
4.4 Clocks
• Every computer contains at least one clock that
synchronizes the activities of its components.
• A fixed number of clock cycles are required to carry
out each data movement or computational operation.
• The clock frequency, measured in megahertz or
gigahertz, determines the speed with which all
operations are carried out.
• Clock cycle time is the reciprocal of clock frequency.
– An 800 MHz clock has a cycle time of 1.25 ns.
27
4.4 Clocks
• Every computer contains at least one clock that
synchronizes the activities of its components.
• A fixed number of clock cycles are required to carry
out each data movement or computational operation.
• The clock frequency, measured in megahertz or
gigahertz, determines the speed with which all
operations are carried out.
• Clock cycle time is the reciprocal of clock frequency.
– An 800 MHz clock has a cycle time of 1.25 ns.
– QUIZ: What is the clock cycle for 3.4 GHz?
28
The CPU time required to run a program is given by this
general performance equation:
Clock speed ≠ CPU performance
29
The CPU time required to run a program is given by this
general performance equation:
Explain the trade-off CISC vs. RISC using this eqn.
Clock speed ≠ CPU performance
30
The CPU time required to run a program is given by this
general performance equation:
We’re designing a CPU that runs at 40 MHz, and will be
used to run image-processing applications that have on
average 100 mil. instructions. How many cycles per
instruction should we aim for in order to ensure the
execution time is no longer than 10 sec. (on average)?
QUIZ: CPU performance
The CPU time required to run a program is given by this
general performance equation:
We’re designing a CPU that runs at 40 MHz, and will be
used to run image-processing applications that have on
average 100 mil. instructions. How many cycles per
instruction should we aim for in order to ensure the
execution time is no longer than 10 sec. (on average)?
10 * 40,000,000 / 100,000,000 = 4 cycles/instr.
QUIZ: CPU performance
32
The CPU time required to run a program is given by this
general performance equation:
Conclusion: we can improve CPU throughput when we:
• reduce the number of instructions in a program
• reduce the number of cycles per instruction
• reduce the number of nanoseconds per clock cycle.
Clock speed ≠ CPU performance
Types of clocks
• System/CPU CLK → Used inside CPU
• Bus CLK(s) → Synchronize data transfer
on buses
Bus CLK freq. < CPU CLK freq. (Why?)
Bottlenecks!
33
To do for next time:
• Read Sections 1, 2, 3, 4 of Ch.4
• Answer Review Questions 1 through 15
• Find a PC ad online and see the difference
between the CPU and bus frequencies
34
We’re designing a RISC CPU that will run programs of
30,000 instructions on the average, with all instructions
requiring 3 clock cycles.
It is desired to have an average execution time of 10
milliseconds.
What is the slowest clock frequency that we can use?
QUIZ: CPU performance
36
4.5 I/O
• A computer communicates with the outside world
through its input/output (I/O) subsystem.
• I/O devices connect to the CPU through various
interfaces.
• I/O can be:
– memory-mapped → the I/O device behaves like
main memory from the CPU’s point of view.
– instruction-based (a.k.a. port-mapped) → the
CPU has a specialized I/O instruction set.
37
Port-mapped I/O example:
The contents of the Accumulator register are
placed in the output register with address 56hex
Intel 8080 and Z80 (Zilog) → OUT 56h
Intel x86 → OUT 56h, AL
Intel x86 → MOV DX, 0056h
OUT DX, AL
This is a hex number.
How many ports do
you think there are?
Why do it
this way?
Not in text
38
Memory-mapped I/O example:
Motorola 68HC11/12
PORTP equ $56
staa PORTP
More details in chapter 7
Not in text
39
4.6 Memory Organization
• Computer memory consists of a linear array of
addressable storage cells (similar to registers).
• Memory can be byte-addressable, or word-
addressable (word typically 2 or more bytes).
• Memory is constructed of RAM chips, often referred to
in terms of length width:
– Length in words
– Width in bits
Example: The word size of a machine is 16 bits, so we
use a 4M 16 RAM chip.
106 or 220 ?
40
4.6 Memory Organization
• Computer memory consists of a linear array of
addressable storage cells (similar to registers).
• Memory can be byte-addressable, or word-
addressable (word typically 2 or more bytes).
• Memory is constructed of RAM chips, often referred to
in terms of length width:
– Length in words
– Width in bits
Example: The word size of a machine is 16 bits, so we
use a 4M 16 RAM chip.
This means 222 16-bit words.
41
Trick QUIZ
A machine has 222 16-bit words of memory. How many
address bits are needed?
42
Trick QUIZ
A machine has 222 16-bit words of memory. How many
address bits are needed?
• If memory is addressable at the Byte level …
• If memory is addressable at the Word level …
Do not get confused!
Memory chips are always described in bits.
How about this chip:
“ 64K x 8 byte-addressable ”
43
Do not get confused!
Memory chips are always described in bits.
How about this chip:
“ 64K x 8, byte-addressable ”
It means “ (64 x 1024)-long x 8 bits-wide,
addressable at the byte level “
44
45
• How does the computer access a memory location
corresponds to a particular address?
• We observe that 4M can be expressed as 2 2 2 20 =
2 22 words.
• The memory locations for this memory are numbered
0 through 2 22 -1.
• Thus, the memory bus of this system requires 22
address lines.
– The address lines “count” from 0 to 222 - 1 in binary. Each
line is either “on” or “off” indicating the location of the
desired memory element.
4.6 Memory Organization
46
The memory bus of this system requires at least 22
address lines.
– The address lines “count” from 0 to 222 - 1 in binary. Each
line is either “on” or “off” indicating the location of the
desired memory element.
If the addresses are written in hex, what is the range
of addresses available in this machine?
QUIZ
47
• Physical memory usually consists of more than one
RAM chip.
• Access is more efficient when memory is organized
into banks of chips with the addresses interleaved
across the chips
4.6 Memory Organization
48
We want to build a 32K x 8, byte-addressable
memory out of 2K x 8 RAM chips.
How many chips?
How many bits are needed
for address?
Example
49
We want to build a 32K x 8 byte-addressable memory
out of 2K x 8 RAM chips.
How many bits are needed
for address?
Example
– Memory is 32K = 25 210 = 215 Bytes
– 15 bits are needed for each address:
–4 bits to select the chip
–11 bits for the offset into the chip to
select the byte.
50
How exactly do we connect the address lines?
Solution 1: high-order interleaving → the high order
address bits specify the memory bank.
Draw bus picture – decoder
needed!
51
How to address this memory?
Solution 2: low-order interleaving → the low order
address bits specify the memory bank.
Draw bus picture – decoder
needed!
52
Solution 1: high-order interleaving → the high order
address bits specify the memory bank.
53
Solution 2: low-order interleaving → the low order
address bits specify the memory bank.
Low-Order Interleaving
High-Order Interleaving
High- vs. low-order interleaving example: 8
RAM chips, each of size 4x8
High-Order Interleaving
QUIZ: Show the addressing logic (decoder)
and bit values needed to address Byte 25
Low-Order Interleaving
QUIZ: Show the addressing logic (decoder)
and bit values needed to address Byte 25
57
Back to 32K x 8 example
• In high-order interleaving the high-order
4 bits select the chip.
• In low-order interleaving the low-order
4 bits select the chip.
Why use low-order interleaving?
The vast majority of real-life programs have space-
locality, i.e. they tend to access data that is stored
close together (e.g. arrays)
• With high-order interleaving, reading/writing N
Bytes takes N read cycles, b/c all N are on the same
chip
• With low-order interleaving, the reads-writes can be
made to overlap (a.k.a. pipelining) → less than N
cycles! 58
Real-life example of low-order
interleaving
59
Triple-channel memory slots, color-coded, on the
eVGA X58 SLI Classified motherboard Source: http://www.guru3d.com/articles_pages/evga_x58_sli_classified_review.html
Not in text
60
4.7 Interrupts
• The normal execution of a program is altered when
an event of higher-priority occurs. The CPU is
alerted to such an event through an interrupt.
• Interrupts can be triggered by:
– I/O requests
– Arithmetic errors (such as division by zero)
– Memory parity errors
– Invalid instruction
– User-defined breakpoints (debugging)
61
4.7 Interrupts
• Each interrupt is associated with an interrupt-
handling procedure that directs the actions of
the CPU when an interrupt occurs.
• Interrupts can be:
– Maskable → CPU can be instructed to ignore them
under certain circumstances
– Nonmaskable → High-priority interrupts that cannot
be ignored under any circumstances
62
4.8 MARIE
• Our model computer, the Machine Architecture that
is Really Intuitive and Easy, MARIE, was designed
for the singular purpose of illustrating basic computer
system concepts.
• While this system is too simple to do anything useful
in the real world, understanding of its functions will
enable us to comprehend system architectures that
are much more complex.
63
The MARIE architecture has the following
characteristics: • Binary, two's complement data representation.
• Stored program, fixed word length data and
instructions.
• 4K words of word-addressable main memory.
• 16-bit data words.
• 16-bit instructions, 4 for the opcode and 12 for the
address.
• A 16-bit arithmetic logic unit (ALU).
• Seven registers for control and data movement.
4.8 MARIE
64
MARIE’s seven registers are:
• Accumulator, AC, a 16-bit register that holds an
operand:
• The one operand for a conditional operator (e.g., "less than")
• One of the two operands of a two-operand instruction.
• Memory address register, MAR, a 12-bit register that
holds the memory address of an instruction or the
operand of an instruction.
• Memory buffer register, MBR, a 16-bit register that
holds the data after its retrieval from, or before its
placement in memory.
65
MARIE’s seven registers are:
• Program counter, PC, a 12-bit register that holds the
address of the next program instruction to be
executed.
• Instruction register, IR, which holds an instruction
immediately preceding its execution.
• Input register, InREG, an 8-bit register that holds data
read from an input device.
• Output register, OutREG, an 8-bit register, that holds
data that is ready for the output device.
66
MARIE architecture
67
• The registers are interconnected,
and connected with main memory
through a common data bus.
• Each device on the bus is identified
by a unique number that is set on
the control lines whenever that
device is required to carry out an
operation.
• Separate connections are also
provided between the accumulator
and the memory buffer register, and
the ALU and the accumulator and
memory buffer register.
MARIE Datapath
This permits data
transfer w/o use
of main bus!
68
To do for next time:
• Read pp.224-233 of Ch.4
• Answer Review Questions 16 through 25
• Solve Exercises 4, 7
69
70
What is the full name of the “fetch-execute”
cycle?
QUIZ
71
Why do high-performance computers use low-
order interleaving for their memory chips?
QUIZ
72
We are designing a computer with 2 MB of RAM,
byte-addressable.
The only chips available are 512K x 4.
Draw the memory-addressing diagram with high-
order interleaving.
QUIZ
Image source: http://eent3.lsbu.ac.uk/units/b3embsys/Week11Combinational%20Logic%20Circuits.htm
73
What are MARIE’s 7 registers?
What are their functions?
Can data go straight from memory into AC?
QUIZ
74
What are interrupts?
Why are they needed in the operation of a
computer?
QUIZ
QUIZ
• Exercise 15 • Exercise 18
75
76
• A computer’s instruction set architecture (ISA)
specifies the format of its instructions and the
primitive operations that the hardware can perform.
• The ISA is an interface between a computer’s
hardware and its software.
• Some ISAs include hundreds of different instructions
for processing data and controlling program
execution.
• The MARIE ISA consists of only 13 instructions.
4.8.3 MARIE ISA
77
Format of a
MARIE instruction:
The fundamental MARIE instructions are:
78
Given this machine code, figure out the MARIE
assembly code:
QUIZ: Reverse assembly
79
• This is a bit pattern for a LOAD instruction as it would
appear in the IR:
• We see that the opcode is 1 and the address from
which to load the data is 3.
• Where does the data go? Always into the AC
(implicit, i.e. hard-wired)
Instruction example
80
This is a bit pattern for a SKIPCOND instruction as it
would appear in the IR:
Instruction example
These two bits specify which condition is tested:
• 00 → skip if AC < 0
• 01 → skip is AC = 0
• 10 → skip if AC > 0
• 11 → ???
81
SKIPCOND
• The opcode is 8 and bits 11 and 10 are 10, meaning
that the next instruction will be skipped if the value
in the AC is greater than zero.
How is skipping accomplished in the hardware?
Increment PC.
Instruction example
82
What is the difference between SKIPCOND and
JUMP X?
Instruction example
How is jumping accomplished in the hardware?
Load X into PC (PC must be a loadable register!)
83
• Each of our instructions actually consists of a
sequence of smaller instructions called
microoperations.
• The exact sequence of microoperations that are
carried out by an instruction can be specified using
register transfer language (RTL).
In the MARIE RTL, we use the notation M[X] to indicate
the actual data value stored in memory location X,
and to indicate the transfer of bytes to a register or
memory location.
Microops and RTL
84
• The RTL for the LOAD instruction is:
• The RTL for the ADD instruction is:
MAR X
MBR M[MAR]
AC AC + MBR
MAR X
MBR M[MAR]
AC MBR
RTL examples
85
• The RTL for the INPUT instruction is:
AC InREG
RTL examples
86
Write the RTL for the following instructions:
• OUTPUT
• JUMP X
• SUBT X
QUIZ: RTL
87
• Recall that SKIPCOND skips the next instruction
according to the value of the AC.
• The RTL for the this instruction is the most complex
in MARIE’s instruction set:
If IR[11-10] = 00 then
If AC < 0 then PC PC + 1
else If IR[11-10] = 01 then
If AC = 0 then PC PC + 1
else If IR[11-10] = 11 then
If AC > 0 then PC PC + 1
else
do nothing
RTL examples
88
4.9 Instruction Processing
Remember the fetch-execute cycle:
• fetch an instruction from memory and place it into IR
• decode IR to determine what needs to do next
• if a memory value (operand) is involved in the
operation, get it (address in MAR, value in MBR)
• with everything in place, execute the instruction
• if the result needs to go into memory, store it (result
in MBR, address in MAR)
The next slide shows a flowchart of this process.
89
Fetch-decode-execute
90
All computers provide a way of interrupting the
fetch-decode-get-execute cycle.
• Sources of interrupts:
– User break (e.g. Control+C) is issued
– I/O request
– Critical error (divide by 0, illegal opcode)
• Interrupts can be caused by hardware or
software.
– Software interrupts are also called traps
91
Interrupt processing involves adding another step to the
fetch-decode-execute cycle:
ISR = Interrupt Service Routine
The starting addresses of all ISRs are stored in an Interrupt
Vector Table
92
Processing an interrupt
Normally stored
in the low-
memory area
93
• For general-purpose systems, it is common to
disable all interrupts during the time in which an
interrupt is being processed.
– Typically, this is achieved by setting a bit in the flags
register.
• Interrupts that are ignored in this case are called
maskable.
• Nonmaskable interrupts are those interrupts that
must be processed in order to keep the system in
a stable condition.
Processing an interrupt
94
Interrupts are very useful in processing I/O.
However, interrupt-driven I/O is complicated
(see Ch.7)
MARIE uses polled I/O instead:
– All output is placed in OutREG
– The CPU polls InREG, until input is sensed, at
which time the value is copied into AC.
Interrupts vs. polling
95
• Consider the simple MARIE program given below.
We show a set of mnemonic instructions stored at
addresses 100 - 106 (hex):
4.10 A Simple Program
96
This is the LOAD 104 instruction:
4.10 A Simple Program
97
ADD 105
4.10 A Simple Program
98
4.11 Discussion on Assemblers
Assemblers translate mnemonics (instructions that are
comprehensible to humans) into the machine
language that is comprehensible to computers
Distinction between an assembler and a compiler:
• In assembly language, there is a one-to-one
correspondence between a mnemonic instruction and
its machine code.
• With compilers, this is not usually the case.
99
• Assemblers create an object program file from
mnemonic source code in two passes.
• During the first pass, the assembler assembles as
much of the program as it can, while it builds a
symbol table that contains memory references for
all symbols in the program.
• During the second pass, the instructions are
completed using the values from the symbol table.
4.11 Assemblers
100
• Consider our example
program at the right.
– Note that we have included two directives HEX and
DEC that specify the radix
of the constants.
• The first pass, creates
a symbol table and the
partially-assembled
instructions:
4.11 Assemblers
101
• After the second pass, the
assembly is complete.
4.11 Assemblers
102
Remark on our Instruction Set
• So far, all of the MARIE instructions that we have
discussed use a direct addressing mode.
• This means that the address of the operand is
explicitly stated in the instruction.
• It is often useful to employ a indirect addressing,
where the address of the address of the operand
is given in the instruction.
– If you have used pointers in programming, you are
already familiar with indirect addressing.
Section 4.11 is the last one required for
the midterm
Homework for Ch.4:
3, 5, 6, 9, 16, 21, 22
Due Tuesday, before the midterm!
103