CIS360 1
CIS 360: Introduction to Computer SystemsCourse Notes
Wayne Heym ([email protected]) http://www.cis.ohio-state.edu/~heym
Rick Parent ([email protected]) http://www.cis.ohio-state.edu/~parent
Copyright © 1998-2003 by Rick Parent, Todd Whittaker, Bettina Bair, Pete Ware, Wayne Heym
CIS360 2
Information Representation 1
Positional Number Systems: position of character in string indicates a power of the base (radix). Common bases: 2, 8, 10, 16. (What base are we using to express the names of these bases?)– Base ten (decimal): digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 form
the alphabet of the decimal system. E.g., 31610 = 3*100 + 1*10 + 6 *1
– Base eight (octal): digits 0, 1, 2, 3, 4, 5, 6, 7 form the alphabet.
E.g., 4748 = 4*82 + 7*81 + 4*80 = 4*64+7*8+4*1
CIS360 3
Information Representation 2
– Base 16 (hexadecimal): digits 0-9 and A-F. E.g., 13C16 =
– Base 2 (binary): digits (called “bits”) 0, 1 form the alphabet.
E.g., 100110 =
– In general, radix r representations use the first r chars in {0…9, A...Z} and have the form dn-1dn-2…d1d0. Summing rn-1dn-1 + rn-2dn-2 + … + r0d0 will convert to base 10. Why to base 10?
1*162 + 3*161 + 12*160 = 1*256+3*16+12*1
32 + 4 + 2
CIS360 4
Information Representation 3
Base Conversions– Convert to base 10 by multiplication of powers
E.g., 100125 =
– Convert from base 10 by repeated division E.g., 63210 = ( )8
– Converting base x to base y: convert base x to base 10 then convert base 10 to base y
1*54 + 1*51+ 2*50 = 1*625 + 1*5 + 2
632/8 = 79 rmdr 0
79/8 = 9 rmdr 79/8 = 1 rmdr 1
1/8 = 0 rmdr 1
1170
CIS360 5
Information Representation 4
– Special case: converting among binary, octal, and hexadecimal is easier
Go through the binary representation, grouping in sets of 3 or 4.
E.g., 110110012 = 11 011 001 = 3318
110110012 = 1101 1001 = D916
E.g., C3B16 = ( )8
1100 0011 1011
110 000 111 011
6 0 7 3
CIS360 6
Information Representation 5 What is special about binary?
– The basic component of a computer system is a transistor (transfer resistor): a two state device which switches between logical “1” and “0” (actually represented as voltages on the range 5V to 0V).
– Octal and hexadecimal are bases in powers of 2, and are used as a shorthand way of writing binary. A hexadecimal digit represents 4 bits, half of a byte.1 byte = 8 bits. A bit is a binary digit.
– Get comfortable converting among decimal, binary, octal, hexadecimal. Converting from decimal to hexadecimal (or binary) is easier going through octal.
CIS360 7
Information Representation 6
Binary Hex Decimal Binary Hex Decimal
0000 0 0 1000 8 8
0001 1 1 1001 9 9
0010 2 2 1010 A 10
0011 3 3 1011 B 11
0100 4 4 1100 C 12
0101 5 5 1101 D 13
0110 6 6 1110 E 14
0111 7 7 1111 F 15
CIS360 8
Information Representation 7
Ranges of values– Q: Given k positions in base n, how many values can
you represent?
– A: nk values over the range (0…nk-1)10
n=10, k=3: 103=1000 range is (0…999)10
n=2, k=8: 28=256 range is (0…255)10
n=16, k=4: 164=65536 range is (0…65535)10
– Q: How are negative numbers represented?
CIS360 9
Information Representation 8 Integer representation:
– Value and representation are distinct. E.g., 12 may be represented as XII, C16, 1210, and 11002. Note: -12 may be represented as -C16, -1210, and -11002.
– Simple and efficient use of hardware implies using a specific number of bits, e.g., a 32-bit string, in a binary encoding. Such an encoding is “fixed width.”
– Four methods: (fixed-width) simple binary, signed magnitude, binary coded decimal, and 2’s complement.
– Simple binary: as seen before, all numbers are assumed to be positive, e.g., 8-bit representation of6610 = 0100 00102 and 19410 = 1100 00102
CIS360 10
Information Representation 9
– Signed magnitude: simple binary with leading sign bit.0 = positive, 1 = negative. E.g., 8-bit signed mag.:
6610 = 0100 00102
-6610 = 1100 00102
What ranges of numbers may be expressed in 8 bits?
Largest:
Smallest:
Extend 1100 0010 to 12 bits:
0111 1111
1111 1111
1000 0100 0010
CIS360 11
Information Representation 10Problems: (1) Compare the signed magnitude numbers1000 0000 and 0000 0000. (2) Must have “subtraction” hardware in addition to “addition” hardware.
– Binary Coded Decimal (BCD): use a 4 bit pattern to express each digit of a base 10 number
0000 = 0 0001 = 1 0010 = 2 0011 = 3 0100 = 4 0101 = 5 0110 = 6 0111 = 7 1000 = 8 1001 = 9 1010 = + 1011 = -
E.g., 123 : 0000 0001 0010 0011+123 : 1010 0001 0010 0011-123 : 1011 0001 0010 0011
CIS360 12
Information Representation 11BCD Disadvantages:
– Takes more memory. 32 bit simple binary can represent more than 4 billion discrete values. 32 bit BCD can hold a sign and7 digits (or 8 digits for unsigned values) for a maximum of110 million values, a 97% reduction.
– More difficult to do arithmetic. Essentially, we must force the Base 2 computer to do Base 10 arithmetic.
BCD Advantages:– Used in business machines and languages, i.e., in COBOL for
precise decimal math.
– Can have arrays of BCD numbers for essentially arbitrary precision arithmetic.
CIS360 13
Information Representation 12
– Two’s Complement Used by most machines and
languages to represent integers. Fixes the -0 in the signed magnitude, and simplifies machine hardware arithmetic.
Divides bit patterns into a positive half and a negative half (with zero considered positive); n bits creates a range of [-2n-1… 2n-1 -1].
CODE0000000100100011010001010110011110001001101010111100110111101111
Simple0123456789
101112131415
Signed+01234567-0-1-2-3-4-5-6-7
2’s comp01234567-8-7-6-5-4-3-2-1
CIS360 14
Information Representation 12
1111111011011100101110101001100001110110010101000011001000010000
0111011001010100001100100001000011111110110111001011101010011000
0111011001010100001100100001000010001001101010111100110111101111
1514131211109876543210
76543210-0-1-2-3-4-5-6-7
76543210-1-2-3-4-5-6-7-8
Simple binary Sign-magnitude 2’s complement
CIS360 15
Information Representation 13
– Representation in 2’s complement; i.e., represent i inn-bit 2’s complement, where -2 n-1 i +2 n-1-1
Nonnegative numbers: same as simple binary Negative numbers:
– Obtain the n-bit simple binary equivalent of | i |
– Obtain its negation as follows:• Invert the bits of that representation
• Add 1 to the result
Ex.: convert -32010 to 16-bit 2’s complement
Ex.: extend the 12-bit 2’s complement number
1101 0111 1000 to 16 bits.
320 = 00000001 01000000-00000001 01000000 = 1111110 10111111 + 1-320 = 11111110 11000000= 0xFEC0
1111 1101 0111 1000= 0xFD78
CIS360 16
Information Representation 14
Binary Arithmetic– Addition and subtraction only for now
– Rules: similar to standard addition and subtraction, but only working with 0 and 1.
0 + 0 = 0 0 - 0 = 0 1 + 0 = 1 1 - 0 = 1 1 + 1 = 10 1 - 1 = 0
– Must be aware of possible overflow. Ex.: 8-bit signed magnitude 0101 0110 + 0110 0011 =
Ex.: 8-bit signed magnitude 0101 0110 - 0110 0011 =
CIS360 17
Information Representation 15
2’s Complement binary arithmetic– Addition and subtraction are the same operation
– Still must be aware of overflow. Ex.: 8 bit 2’s complement: 2310 + 4510 =
Ex.: 8 bit 2’s complement: 10010 + 4510 =
Ex.: 8 bit 2’s complement: 2310 - 4510 =
CIS360 18
Information Representation 16
– 2’s Complement overflow Opposite signs on operands can’t overflow If operand signs are same, but result’s sign is different, must
have overflow Can two positives sum to positive and still have overflow?
Can two negatives?
CIS360 19
Information Representation 17 Characters and Strings
– EBCDIC, Extended Binary Coded Decimal Interchange Code Used by IBM in mainframes (360 architecture and descendants). Earliest system
– ASCII, American Standard Code for Information Interchange. Most common system
– Unicode, http://www.unicode.org New international standard Variable length encoding scheme with either 8- or 16-bit minimum “a unique number for every character, no matter what the platform,
no matter what the program, no matter what the language.”
CIS360 20
Information Representation 18
ASCII– see table 1.7 on pg. 18.
In Unix, run “man ascii”.
– 7 bit code Printable characters for human interactions Control characters for non-human communication (computer-
computer, computer-peripheral, etc.)
– 8-bit code: most significant bit may be set Extended ASCII (IBM), includes graphical symbols and lines ISO 8859, several international standards Unicode’s UTF-8, variable length code with 8-bit minimum
CIS360 21
ASCII Easy to decode
– takes up a predictable amount of space
Upper and lower case characters are 0x20 (3210) apart
ASCII representation of ‘3’ is not the same as the binary representation of 3. – To convert ASCII to binary (an integer), ‘3’-’0’ = 3
Line feed (LF) character– 000 10102= 0x0a= 1010
Character ASCII Binary ASCII Hex
‘ ’ 010 0000 0x20‘A’ 100 0001 0x41‘a’ 110 0001 0x61‘R’ 101 0010 0x52‘r’ 111 0010 0x72‘0’ 011 0000 0x30‘3’ 011 0011 0x33
CIS360 22
Information Representation 19
String: definition is programming language dependent. – C, C++: strings are arrays of characters terminated by a
null byte.
Parity: Simple error detection– Data transmission, aging media, static interference,
dust on media, etc. demand the ability to detect errors.
– Single bit errors detected by using parity checking.
CIS360 23
Information Representation 20
– How to detect a 1-bit error: Ex.: send ASCII ‘S’: send 1010011, but receive 1010010?
Add a 1-bit parity to make an odd or even number of bits per byte.
Parity bit is stripped by hardware after checking. Sender/receiver both agree to odd or even parity.
2 flipped bits in the same encoding are not detected.
‘ S’ ‘ E’ASCII 101 0011 100 0101Even parity 0101 0011 1100 0101Odd Parity 1101 0011 0100 0101
CIS360 24
Information Representation 21 Two meanings for Hamming distance. 2nd is
generalization of 1st. 1st is: distance between two encodings of the same length.1. A count of the number of bits different in encoding 1 vs.
encoding 2.E.g., dist(1100, 1001) =
dist(0101, 1101) =2. Generalize to an entire code by taking the minimum over all
distinct pairs (2nd meaning).– The ASCII encoding scheme has a Hamming distance of 1.– A simple parity encoding scheme has a Hamming distance of 2.
Hamming distance serves as a measure of the robustness of error checking (as a measure of the redundancy of the encoding).
CIS360 25
Information Representation
Simple data compression– ASCII codes are fixed length.
– Huffman codes are variable length and based on statistics of the data to be transmitted.
Assign the shortest encoding to the most common character.– In English, the letter ‘e’ is the most common.
– Either establish a Huffman code for an entire class of messages,
– Or create a new Huffman code for each message, sending/storing both the coding scheme and the message.
“a widely used and very effective technique for compressing data; savings of 20% to 90% are typical, depending on the characteristics of the file being compressed.” (Cormen, p. 337)
CIS360 26
Information Representation 22 Huffman Tree for “a man a plan a canal panama”
– Examine data set and determine frequencies of letters (example ignores spaces, normally significant)
– Create a forest of single node trees. Choose the two trees having the smallest total frequencies (the two “smallest” trees), and merge them together (lesser frequency as the left subtree, for definiteness, to make grading easier). Continue merging until only one tree remains.
Count Frequency
‘ a’ 10 0.476190
‘ c’ 1 0.047619
‘ l ’ 2 0.095238
‘ m’ 2 0.095238
‘ n’ 4 0.190476
‘ p’ 2 0.095238
CIS360 27
Information Representation 23
Huffman Tree for "a man a plan a canal panama"
'a'.4762
'n'.1905
'c'.0476
'l'.0952
.1428
'm'.0952
'p'.0952
.1905
.3333
.5238
1.0
Reading a ‘1’ calls for following the left branch.
Reading a ‘0’ calls for following the right branch.
Decoding using the tree:To decode ‘0001’, start at root and follow r_child, r_child, r_child, l_child, revealing encoded ‘m’.
CIS360 28
Information Representation 24 Comparison of Huffman and 3-bit code example
– 3-bit: 000 011000100 000 101010000100 000 001000100000010 101000100000011000 = 63 bits
– Huffman: 1 0001101 1 00000010101 1 001110110010 0000101100011 = 46 bits
– Savings of 17 bits, or 27% of original message
3-bit code Huffman Code Count H length 3 length
‘a’ 000 1 10 10 30
‘c’ 001 0011 1 4 3
‘l’ 010 0010 2 8 6
‘m’ 011 0001 2 8 6
‘n’ 100 01 4 8 12
‘p’ 101 0000 2 8 6
Totals 46 63
CIS360 29
ISEM FAQ 1 Editing, Assembling, Linking, and Loading
– There are three components to the Instructional SPARC Emulator (ISEM) package that we use for this class:
the assembler, the linker, and the emulator/debugger.
CIS360 30
TERMS Bit Byte Halfword Word Doubleword Kilobyte (KB) Megabyte (MB) Gigabyte (GB)
Second (s) Millisecond (ms) Microsecond (s) Nanosecond (ns) Picosecond (ps)
Hetz (Hz) Kilohertz (kHz) Megahertz (MHz)
100 megahertz = ? Clock period10 ns
CIS360 31
ISEM FAQ 2 Editing
– There are a number of programs that you can use to create your source files.
Emacs is probably the most popular,; vi is also available, but its command syntax is difficult to learn
and use; using pine program, you can use the pico editor, which
combines many features of Emacs into a simple menu-driven facility
– Start Emacs by “xemacs sourcefile.s &”, which creates the file called sourcefile.s.
– Use the tutorial, accessed by typing "Ctrl-H Ctrl-H t". – For other editors, you are on your own.
CIS360 32
Example Sparc Assembly Language Instructions
% type xmp0.s .data ! Assembler directive: data starts here. A_s, B_s, andA_s: .word ’?’ ! C_s are symbolic constants. Furthermore, eachB_s : .word 0x30 ! is an address of a certain-sized chunk of memory. Here,C_s : .word 0 ! each chunk is four bytes (one word) long. When the
! program gets loaded, each of these chunks stores a ! number in 2’s complement encoding, as follows: At ! address C_s, zero; at B_s, 48; at A_s, 0x3F = 077 = 63.
.text ! Assembler directive, instructions start herestart: ! Label (symbolic constant) for this address set A_s, %r2 ! Put address A_s into register 2 ld [%r2], %r2 ! Use r2 as an indirect address for a load (read) set B_s, %r3 ! Put address B_s into register 3 ld [%r3], %r3 ! Read from B_s and replace r3 w/ value at addr B_s sub %r2, %r3, %r2 ! Subtract r3 from r2, save in r2 set C_s, %r4 ! Put address C_s into register 4 st %r2, [%r4] ! Store (write) r2 to memory at address C_sterminate: ! Label for address where ’ta 0’ instruction stored ta 0 ! Stop the programbeyond_end: ! Label for address beyond the end of this program
CIS360 33
ISEM FAQ 3 Assembling
– The assembler is called "isem-as", and is the GNU Assembler (GAS), configured to cross-assemble to a SPARC object format.
– It is used to take your source code, and produce object code that may be linked and run on the ISEM emulator.
– The syntax for invoking the assembler is:
isem-as [-a[ls]] sourcefile.s -o objectfile.o
– The input is read from sourcefile.s, and the output is written to objectfile.o.
– The option "-a" tells the assembler to produce a listing file. The sub-options "l" and "s" tell the assembler to include the assembly source in the listing file and produce a symbol table, respectively.
CIS360 34
ISEM FAQ 4 The listing file
– Will identify all the syntactic errors in your program, and it will warn you if it identifies "suspicious" behavior in your source file.
– Column 1 identifies a line number in your source file.
– Column 2 is an offset for where this instruction or data resides in memory.
– Column 3 is the image of what is put in memory, either the machine instructions or the representation of the data.
– The final column is the source code that produced the line.
– At the bottom of the file you will find the symbol table.
– Again, the symbols are represented as offsets that are relocated when the program is loaded into memory.
CIS360 35
isem-as -als labn.s -o labn.o >! labn.lst
1 .data 2 0030 0000003F A_s: .word ’?’ 3 0034 00000030 B_s: .word 0x30 4 0038 00000000 C_s: .word 0 5 .text 6 start: 7 0000 05000000 set A_s, %r2 7 8410A000 8 0008 C4008000 ld [%r2], %r2 9 000c 07000000 set B_s, %r3 9 8610E000 10 0014 C600C000 ld [%r3], %r3 11 0018 84208003 sub %r2, %r3, %r2 12 001c 09000000 set C_s, %r4 12 88112000 13 0024 C4210000 st %r2, [%r4] 14 terminate: 15 0028 91D02000 ta 0 16 002c 00000000 beyond_end: DEFINED SYMBOLS xmp0.s:2 2:00000030 A_s xmp0.s:3 2:00000034 B_s xmp0.s:4 2:00000038 C_s xmp0.s:6 1:00000000 start xmp0.s:14 1:00000028 terminate xmp0.s:16 1:0000002c beyond_end UNDEFINED SYMBOLS
Line in source file (.s)
Offset to address
in memory
Contents at
address in
memoryLabels
are symbolic offsets
CIS360 36
ISEM FAQ 5 Linking
– Linking turns a set of raw object file(s) into an executable program. – From the manual page, "ld combines a number of object and archive files,
relocates their data and ties up symbol references. Often the last step in building a new compiled program to run is a call to ld."
– Several object files are combined into one executable using ld; the separate files could reference symbols from one another.
– The output of the linker is an executable program.– The syntax for the linker is as follows:
isem-ld objectfile.o [-o execfile]
Examples
% isem-ld foo.o -o foo Links foo.o into the executable foo. % isem-ld foo.o Links foo.o into the executable a.out.
CIS360 37
ISEM FAQ 6 Loading/Running
– Execute the program and test it in the emulation environment.
– The program "isem" is used to do this, and the majority of its features are covered in your lab manual.
– Invoke isem as follows
isem [execfile]
Examples
% isem foo Invokes the emulator, loads the program foo % isem Invokes the emulator, no program is loaded
– Once you are in the emulator, you can run your program by typing "run" at the prompt.
CIS360 38
ISEM Debugging Tools 1% isem xmp0 Instructional SPARC EmulatorCopyright 1993 - Computer Science Department University of New Mexico ISEM comes with ABSOLUTELY NO WARRANTY ISEM Ver 1.00d : Mon Jul 27 16:29:45 EDT 1998 Loading File: xmp02000 bytes loaded into Text region at address 8:02000 bytes loaded into Data region at address a:2000 PC: 08:00000020 nPC: 00000024 PSR: 0000003e N:0 Z:0 V:0 C:0 start : sethi 0x8, %g2 ISEM> runProgram exited normally.
Assembly language programs are not notoriously chatty.
CIS360 39
ISEM Debugging Tools 2 reg
– Gives values of all 32 general registers
– Also PC
symb– Shows the resolved values
of all symbolic constants
dump [addr]– Either symbol or hex
address
– Gives the values stored in memory
ISEM> reg
----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7---
G 00000000 00000000 0000000f 00000030 00002068 00000000 00000000 00000000
O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
PC: 08:0000004c nPC: 00000050 PSR: 0000003e N:0 Z:0 V:0 C:0
beyond_end : unimp
ISEM> symb
Symbol List
A_s : 00002060
B_s : 00002064
C_s : 00002068
beyond_end : 0000004c
start : 00000020
terminate : 00000048
ISEM> dump A_s
0a:00002060 00 00 00 3f 00 00 00 30 00 00 00 0f 00 00 00 00 ...?...0........
0a:00002070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0a:00002080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
CIS360 40
ISEM Debugging Tools 3 break [addr]
– Set breakpoints in execution
– Once execution is stopped, you can look at the contents of registers and memory.
trace – Causes one (or more) instruction(s) to be executed
– Registers are displayed
– Handy for sneaking up on an error when you’re not sure where it is.
For the all-time “most wanted” list of errors (and their fixes)– http://www.cis.ohio-state.edu/~heym/360/common/faq.html
CIS360 41
D F lip F lopD ataI n
C lock
D ataO ut
one cycle
Basic Components 1
Terminology from Ch. 2:– Flip flop: basic storage device that holds 1 bit
– D flip flop: special flip flop that outputs the last value that was input to it (a data signal).
– Clock: two different meanings: (1) a control signal that oscillates (low to high voltage) every x nanoseconds; (2) the “write select” line for a flip flop.
CIS360 42
Basic Components 2
– Register: collection of flip flops with parallel load. Clock (or “write select”) signal controlled. Stores instructions, addresses, operands, etc.
– Bus: Collection of related data lines (wires).
d7 d6 d5 d4 d3 d2 d1 d0
I nput B us
O utput B us
C lock 8 B it R egister
8
8
C lock
CIS360 43
Basic Components 3
– Combinational circuits: implement Boolean functions. No feedback in the circuit, output is strictly a function of input.
Gates: and, or, not, xor
E.g., xy + z
AN D O R N O T X O R
x
y
z f
CIS360 44
Basic Components 4
– Gates can be used in combination to implement a simple (half) adder.
Addition creates a value, plus a carry-out.
Z = X Y
CO = X Y
X Y Z CO
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
X
Y
Z
CO
CIS360 45
Basic Components 5
– Sequential Circuits: introduce feedback into the circuit. Outputs are functions of input and current state.
– Multiplexers: combinational circuits that use n bits to select an output from 2n input lines.
4 to 1 M UX
s 0 s 1
f
i0i1i2i3
D
C
Q
CIS360 46
Basic Components 6 Von Neumann
Architecture– Can access either
instructions or data from memory in each cycle.
– One path to memory(von Neumann bottleneck)
– Stored program system. No distinction between programs and data
M ain M em ory S ys tem
O perational Regis ters
P rogram Counter
Arithm etic and Logic Unit
Contro l Unit
Input/O utput S ys tem
Addres sP athw ay
D ata andIns truc tionP athw ay
CIS360 47
Basic Components 7
Examples of Von Neumann architecture to be explored in this course:
SAM: tiny, good for learning architecture MIPS: text’s example assembly language SPARC: labs
Roughly, the order of presentation in this course is as follows:
A couple of days on the Main Memory System Weeks on the Central Processing Unit (CPU) Finish the course with the I/O System
CIS360 48
Basic Components 8
Memory: Can be viewed as an array of storage elements. – The index of each element is called the address.
– Each element holds the same number of bits. How many bits per element? 8, 16, 32, 64?
0
1
2
...
n-1
0
1
2
...
n-1
0
1
2
...
n-1
0
1
2
...
n-1
8, byte 16 32 64
CIS360 49
Memory Element & Address Sizes
•If a machine’s memory is 5-bit addressable, then, at each distinct address, 5 bits are stored. The contents at each address are represented by 5 bits.•If 3 bits are used to represent memory addresses, then the memory can have at most 23 = 8 distinct addresses.•Such a memory can store at most 8 5 = 40 bits of data.•If the data bus is 10 bits wide, then up to 10 bits at a time can be transferred between memory and processor; this is a 10-bit word.
Address
ContentsDecimal
Binary
0 000 00011
1 001 01111
2 010 01110
3 011 10100
4 100 00101
5 101 01110
6 110 10100
7 111 10011
CIS360 50
Basic Components 9 Let’s look deeper.
– Suppose each memory element is stored in a bank and given a relative address.
– You could have several such banks in your memory.
– The GLOBAL address of each element would be:[relative address] & [bank address].
– To get two elements at a time, start reading from bank 0 (don’t start from bank 1; this would be a “memory address not aligned” error).
000001010011100101
Bank 0
000001010011100101
Bank 0000001010011100101
Bank 1000 0001 0010 0011 0100 0101 0
000 1001 1010 1011 1100 1101 1
Global addresses,not contents.
Think of the contents as beingunderneath the global addresses.
CIS360 51
Basic Components 10
– Memory alignment: Assume a byte addressable machine with 4-byte words. Where are operands of various sizes positioned?
bytes: on a byte boundary (any address) half words: on half word boundary (even addresses) words: on word boundary (addresses divisible by 4) double words: on double word boundary (addresses divisible
by 8)
CIS360 52
Basic Components 11
Byte ordering: how data is stored in memory big-endian: High order (big end) is at byte 0. little-endian: Low order (little end) is at byte 0.
– Ex.: 24789651110 = 0EC699BF16
0E C6 99 BF 0EC699BF
big-endian little-endian
0 1 2 3 0 1 2 3
CIS360 53
Basic Components 12
Read/Write operations: must know the address to read or write. (read = fetch = load, write = store)
CPU puts address on address bus
CPU sends read signal– (R/W=1, CS=1)
– (Read/don’t Write, Chip Select) Wait
Memory puts data ondata bus
– reset (CS=0)
D0D1
D(n-1)
A0A1
A(m-1)
CS
R/ W
CIS360 54
Basic Components 13– Types of memory:
ROM: Read Only Memory: non-volatile (doesn’t get erased when powered down; it’s a combinational circuit!)
PROM: Programmable ROM: use a ROM burner to write data to it initially. Can’t be re-written.
EPROM: Erasable PROM. Uses UV light to erase. EEPROM: Electrically Erasable PROM. RAM: Random access memory. Can efficiently read/write any
location (unlike sequential access memory). Used for main memory.
– Many variations (types) of RAM, all volatile• SDRAM, DDR SDRAM• RDRAM• www.tomshardware.com
CIS360 55
Basic Components 14
CPU: executes instructions -- primitive operations that the computer can perform.– E.g., arithmetic A+B
data movement A := B
control if expr goto label
logical AND, OR, XOR…
Instructions specify both the operation and the operands. Operands are usually locations in memory where the actual operands may be found (addresses of actual operands).
CIS360 56
Basic Components 15
– Instruction set: all instructions for a machine. Instruction format specifies number and type of operands.
Ex.: Could have an instruction like
ADD A, B, RWhere A, B, and R are the addresses of operands in memory. The result is R := A+B.
8
9
1 7
0
4
8
C
A
B
R
M em oryAddr Label
CIS360 57
Basic Components 16
– Actually, the “instruction” might be represented in a source file as:0x41444420412C20422C20520A. … A D D A , B , RAs such, it is an assembly language instruction.
– An assembler might translate it to, say, 0x504C, the machine’s representation of the instruction.As such, it is a machine language instruction.
CIS360 58
A Simple Instruction Set 1 Simple instruction set: the Accumulator machine.
– Simplify instruction set by only allowing one operand. Accumulator implied to be the second operand.
– Accumulator is a special register. Similar to a simple calculator.
ADD addr ACC ACC + M[addr] SUB addr ACC ACC - M[addr] MPY addr ACC ACC * M[addr] DIV addr ACC ACC / M[addr] LOAD addr ACC M[addr] STORE addr M[addr] ACC
CIS360 59
A Simple Instruction Set 2 Ex.: C = AB + CD
LOAD 20 ! 1)Acc<-M[20]MPY 21 ! 2)Acc<-Acc*M[21]STORE 30 ! M[30]<-AccLOAD 22 ! 3)Acc<-M[22]MPY 23 ! 4)Acc<-Acc*M[23]ADD 30 ! 5)Acc<-Acc+M[30]STORE 22 ! M[22]<-Acc
– Machine language: Converting from assembly language to machine language is called assembling.
20
21
22
23
. . .
A
B
C
D
tem p30
Accumulator
1)2)3)4)5)
CIS360 60
A Simple Instruction Set 3 Assume 8-bit architecture. Each instruction may be 8 bits. 3
bits hold the op-code and 5 bits hold the operand.
How much memory can we address? How many op-codes can we have? Convert the mnemonic op-codes into binary codes.
7 5 4 0
o p - c o d e o p e r a n d
Operation Code
ADD 000SUB 001MPY 010DIV 011LOAD 100STORE 101
CIS360 61
A Simple Instruction Set 4 Hand assemble our program:
LOAD 20 100 10100MPY 21 010 10101STORE 30 101 11110... ...
Instructions are stored in consecutive memory:Addr Memory Mnemonic
0 100 10100 LOAD A1 010 10101 MPY B2 101 11110 STORE temp3 100 10110 LOAD C4 010 10111 MPY D5 000 11110 ADD temp6 101 10110 STORE C… …20 4 A21 5 B22 6 C23 7 D… …30 20 temp
CIS360 62
A Simple Instruction Set 5
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
CIS360 63
A Simple Instruction Set 6
– Control signals: control functional units to determine order of operations, access to bus, loading of registers, etc..
Number Operation Number Operation
0 ACC bus 8 ALU ACC1 load ACC 9 INC PC2 PC bus 10 ALU operation3 load PC 11 ALU operation4 load IR 12 Addr bus5 load MAR 13 CS6 MDR bus 14 R/W7 load MDR
CIS360 64
A Simple Instruction Set 7P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
0
12
3
State
Y N
4
5Y N
7
8
6
CIS360 65
State 0: Control Signals 2, 5, 9, 3
M A R M D R
I R
D ecode
2 to
1M
UXA C C
2 t
o 1
MU
X
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
Put the address of the next instruction in the Addr Register and Inc. PC.
CIS360 66
State 1: Control Signals 13, 14
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
Fetch the word of memory at Address, and load into Data Register.
CIS360 67
State 2: Control Signals 6, 4
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
Send the word from the Data Register to the Instruction Register.
CIS360 68
State 3: Control Signals 12, 5
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
Put the address from the instruction in the Address Register.
CIS360 69
After State 3, what values are now stored in each register?
PC MAR MDR IR ACC
CIS360 70
State 4: Control Signals 0, 7
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
Take the value from the ACCumulator and store it in the Data Register.
CIS360 71
State 5: Control Signal 13
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
Write the data from the Data Register to the address stored in the MAR.
CIS360 72
State 6: Control Signals 13, 14
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
Load the word at the Address from the Addr Reg into the Data Register.
CIS360 73
After State 6, what values are now stored in each register?
PC MAR MDR IR ACC
CIS360 74
State 7: Control Signals 6, 1
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
Load the word from Data Register into the ACCumulator.
CIS360 75
State 8: Control Signals 6, 8, 10/11, 1
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
Use word from the Data Register for Arith Op and put result in ACC.
CIS360 76
New Instruction•What is necessary to implement a new instruction?
•New states?•New control signals?•New fetch/execute cycle?
•An Example: •SWAP
Exchange value in Accumulator with value at Address
•SWAP addr ! Acc <- #M[addr], M[addr] <- #Acc
CIS360 77
New Instruction What changes to fetch/execute cycle?
– The fetch part of the cycle usually remains the same.
– Recall the values stored in registers after each state E.g., After State 6, what values are in each register?
– PC
– MAR
– MDR
– IR
– ACC Handy to have M[addr] in MDR
– Start after state 6 then… .
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
CIS360 78
New State 9: Control Signals 6, 5
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
Save the Data value from the MDR in the Address Register.
MDR -> busLoad MAR
CIS360 79
New State 10: Control Signals 0, 7
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
Send the ACCumulator value to the Data Register.
ACC -> busload MDR
CIS360 80
New State 11: Control Signals ?, 1
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
Put the saved value from the MAR into the ACCumulator.
MAR->busload ACC
Note: there is no control signal in the current architecture opposite of5 (Load MAR), so we would have to create a new control signal (MAR to bus) in addition to creating these new states.
CIS360 81
New State 12 (Old 3): Control Signals 12, 5
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
Put (reload) the address from the instruction in the Address Register.
Addr -> busload MAR
CIS360 82
New State 13 (Old 5): Control Signals 13
M A R M D R
I R
D ecode
2 to 1M
UXA C C
2 to
1M
UX
P C
I N C
T iming andC ontr ol
M emor y
B us
A ddr O p
0
1
2
3
4
5 6 7
8
9
A L U
10 11
12
13 14
Write the data from the Data Register to the address stored in the MAR.
CS
CIS360 83
New Instruction Example Summary
Changes to States, added 9 thru 13 Changes to Signals, added 15:
MAR -> bus Changes to Fetch/Execute, new
register transfer language (RTL)PC -> bus, load MAR, INC PC, Load PCCS, R/wMDR -> bus, load IRAddr -> bus, load MARCS, R/w MDR -> bus, load MARACC -> bus, load MDRMAR -> bus, load ACCAddr -> bus, load MARCS
P C to busload M A RI N C to P C
load P C
C S , R /W
M D R to busload I R
A ddr to busload M A R
O P =stor e
A C C to busload M D R
C S
C S , R /W
M D R to busload A C C
O P =load
M D R to busA L U to A C C
A L U opload A C C
F etch
E xecute
CIS360 84
Hardware Control Signals
XXX
counter
101
0110
Opcode = 101And
Clock = 0110
clock
IROpen gates, CS, etc.
CIS360 85
Instruction Set Architectures 1
RISC vs. CISC– Complex Instruction Set Computer (CISC): many,
powerful instructions. Grew out of the need for high code density. Instructions have varying lengths, number of operands, formats, and clock cycles in execution.
– Reduced Instruction Set Computer (RISC): fewer, less powerful, optimized instructions. Grew out of need for speed. Instructions have fixed length, number of operands, formats, and similar number of clock cycles in execution.
CIS360 86
Instruction Set Architectures 2
Motivation: memory is comparatively slow.– 10x to 20x slower than processor.
– Need to minimize number of trips to memory. Provide faster storage in the processor -- registers. Registers (16, 32, 64 bits wide) are used for intermediate
storage for calculations, or repeated operands. Accumulator machine
– One data register -- ACC.
– 2 memory accesses per instruction -- one for the instruction and one for the operand.
Add more registers (R0, R1, R2, …, Rn)
CIS360 87
Instruction Set Architectures 3
How many addresses to specify?– With binary operations, need to know two source
operands, a destination, and the operation. E.g., op (dest_operand) (src_op1) (src_op2)
– Based on number of operands, could have: 3 addr. machine: both sources and dest are named. 2 addr. machine: both sources named, dest is a source. 1 addr. machine: one source named, other source and dest. is
the accumulator. 0 addr. machine: all operands implicit and available on the
stack.
CIS360 88
Instruction Set Architectures 4
1-address architecture: a:=ab+cde– Memory only Using registers
1½-address architecture: one operand must always be a register. (½ address is register, 1 address is the memory operand: LOAD 100, R1).
– Like an accumulator machine, but with many accumulators.
Code # mem refs
LOAD 100 2MPY 104 2STORE 100 2LOAD 108 2MPY 112 2MPY 116 2ADD 100 2STORE 100 2
Code # mem refs
LOAD 100 2MPY 104 2STORE R2 1LOAD 108 2MPY 112 2MPY 116 2ADD R2 1STORE 100 2
CIS360 89
Instruction Set Architectures 5
3-address architecture: a:=ab+cde– Using memory only:
– Using registers:
– What about instruction size?
Code # mem refs
MPY 100, 100, 104 ;a:=abMPY 200, 108, 112 ;t:=cdMPY 200, 116, 200 ;t:=etADD 100, 200, 100 ;a:=t+a
Code # mem refs
MPY R2, 100, 104 ;t1:=abMPY R3, 108, 112 ;t2:=cdMPY R3, 116, R3 ;t2:=et2ADD 100, R3, R2 ;a:=t1+t2
Memory
100 (a)104 (b)108 (c)112 (d)116 (e)...200 (t)
CIS360 90
Instruction Set Architectures 6
2-address architecture: a:=ab+cde– Using memory only:
– Using registers:
– Most CISC arch. this way, making 1 operand implicit
Code # mem refs
MPY 100, 104 ;a:=ab 4MOVE 200, 108 ;t:=c 3MPY 200, 112 ;t:=td 4MPY 200, 116 ;t:=te 4ADD 100, 200 ;a:=t+a 4
Memory
100 (a)104 (b)108 (c)112 (d)116 (e)...200 (t)
Code # mem refs
MPY 100, 104 ; a: =ab 4MOVE R2, 108 ; R2: =c 2MPY R2, 112 ; R2: =R2d 2MPY R2, 116 ; R2: =R2e 2ADD 100, R2 ; a: =t +a 3
CIS360 91
Instruction Set Architectures 7
0-address architecture: a:=ab+cde– Stack machine: All operands are implicit. Only push
and pop touch memory. All other operands are pulled from the top of stack, and result is pushed on top.E.g., HP calculators.
Code # mem refs
PUSH A 2PUSH B 2MPY 1PUSH C 2PUSH D 2PUSH E 2MPY 1MPY 1ADD 1POP A 2
Stack
4
3
2
1
0
CIS360 92
Instruction Set Architectures 8
Load/Store Architectures -- RISC– Use of registers is simple and efficient. Therefore, the
only instructions that can access memory are load and store. All others reference registers.
Code # mem refs
LOAD R2, 100 ;R2a 2
LOAD R3, 104 ;R3b 2LOAD R4, 108 ;R4c 2LOAD R5, 112 ;R5d 2LOAD R6, 116 ;R6e 2MPY R2, R2, R3 ;R2ab 1MPY R3, R4, R5 ;R3cd 1MPY R3, R3, R6 ;R3(cd)e 1ADD R2, R2, R3 ;R2ab+(cd)e 1STORE 100, R2 ;aab+(cd)e 2
CIS360 93
Instruction Set Architectures 9 Why load/store architectures?
– Number of instructions (hence, memory references to fetch them) is high, but can work without waiting on memory.
– Claim: overall execution time is lower. Why? Clock cycle time is lower (no micro code interpretation). More room in CPU for registers and memory cache. Easier to overlap instruction execution through pipelining.
– Side effects: Register interlock: delaying execution until memory read completes. Instruction scheduling: rearranging instructions to prevent register
interlock (loads on SPARC) and to avoid wasting the results of pipelined execution (branches on SPARC).
CIS360 94
SPARC Assembly Language 1 SPARC (Scalable Processor ARChitecture)
– Used in Sun workstations, descended from RISC-II developed at UC Berkeley
– General Characteristics: 32-bit word size (integer, address, register size, etc.) Byte-addressable memory RISC load/store architecture, 32-bit instruction, few
addressing modes Many registers (32 general purpose, 32 floating point, various
special purpose registers)
– ISEM: Instructional SPARC Emulator - nicer than a real machine for learning to write assembly language programs.
CIS360 95
SPARC Assembly Language 2 Structure
– Line oriented: 4 types of lines Blank - Ignored Labeled -
– Any line may be labeled. Creates a symbol in listing. Labels must begin with a letter (other than ‘L’), then any alphanumeric characters. Label must end with a colon “:”. Label just assigns a name to an address.
Assembler Directives - E.g., .data .word .text, etc.
Instructions
– Comments start after “!” character and go to the end of the line.
.data
x: .word 0x42y: .word 0x20z: .word 0
.text
start:
set x, %r2 ld [%r2], %r2 set y,%r3 ld [%r3], %r3
! Load [x] into reg 2! Load [y] into reg 3
CIS360 96
SPARC Assembly Language 3
Directives: Instructions to the assembler– Not executed by the machine
.data -- following section contains declarations– Each declaration reserves and initializes a certain number of bits
of storage for each of zero or more operands in the declaration.• .word -- 32 bits
• .half -- 16 bits
• .byte -- 8 bitsE.g.,
.dataw: .half 27000x: .byte 8y: .byte ’m’, 0x6e, 0x0, 0, 0z: .word 0x3C5F
.text -- following section contains executable instructions
CIS360 97
SPARC Assembly Language 4
Registers -- 32 bits wide– 32 general purpose integer registers, known by several
names to the assembler %r0-%r7 also known as %g0-%g7 global registers -- Note, %r0 always contains value 0.
%r8-%r15 also known as %o0-%o7 output registers %r16-%r23 also known as %l0-%l7 local registers %r24-%r31 also known as %i0-%i7 input registers Use the %r0-%r31 names for now. Other names are used in
procedure calls.
– 32 floating point registers %f0-%f31. Each reg. is single precision. Double prec. uses reg. pairs.
CIS360 98
SPARC Assembly Language 5
Assembly language– 3-address operations - format different from book
op src1, src2, dest !opposite of textE.g., add %r1, %r2, %r3 !%r3 %r1 + %r2
or %r2, 0x0004, %r2 !%r2 %r2 + 0x0004
– Contrast SPARC with MiPs (used in the book) indirect address notation: @addr vs [addr] operand order, especially the destination register register notation: R2 vs. %r2 branches
CIS360 99
SPARC Assembly Language 6
– 2-address operations: load and storeld [addr], %r2 ! %r2 M[addr]st %r2, [addr] ! M[addr] %r2
Often use set to put an address (a label, a symbolic constant) into a register, followed by ld to load the data itself.
set x, %r1 !put addr x into %r1ld [%r1],%r2 !use addr in %r1 to load %r2
– Immediate values: instruction itself contains some data to be used in execution.
CIS360 100
SPARC Assembly Language 7
– Immediate values (continued)E.g., add %rs, siconst13, %rd !%rd%rs+const Constant is coded into instruction itself, therefore available
after fetching the instruction (no extra trip to memory for an operand).
On SPARC, no special notation for differentiating constants from addresses because no ambiguity in a load/store architecture.
Immediate value coded in 13 bit sign-extended value. Range is, then, -212…212-1 or -4096 to 4095.
Immediate values can be specified in decimal, hexadecimal, or octal.
E.g., add %r2, 0x1A, %r2 ! %r2 %r2 + 26
CIS360 101
SPARC Assembly Language 8
– Synthetic Instructions: assembler translates one “instruction” into several machine instructions.
set : used to load a 32-bit signed integer constant into a register. Has 2 operands - 32 bit value and register number. How does that fit into a 32 bit instruction?
E.g., set iconst32, %rd
set -10, %r3set x, %r4set ’=’, %r8
clr %rd : used to set all bits in a register to 0. How? mov %rs, %rd : copies a register. neg %rs, %rd : copies the negation of a register.
CIS360 102
SPARC Assembly Language 9
– Operand sizes double word = 8 bytes, word = 4 bytes, half word = 2 bytes,
byte = 8 bits. Recall memory alignment issues.set x, %r2 !Put addr x in %r2ld [%r2], %r1 !load wordldsb [%r2], %r1 !load byte, sign extendedldub [%r2], %r1 !load byte, extend with 0’s
st %r1, [%r2] !store word, addr is mult of 4stb %r1, [%r2] !store byte, any addresssth %r1, [%r2] !store half word, address is even
– Characters use 8 bits ldub to load a character stb to store a character
CIS360 103
SPARC Assembly Language 10
– Traps : provides initial help with I/O, also used in operating systems programming.
ta 0 : terminate program ta 1 : output ASCII character from %r8 ta 2 input ASCII character into %r8 ta 4 : output integer from %r8 in unsigned hexadecimal ta 5 : input integer into %r8, can be decimal, octal, or hex
E.g.,set ’=’, %r8 !put ’=’ in %r8ta 1 !output the ’=’ta 5 !read in value into %r8mov %r8, %r1 !copy %r8 into %r1set 0x0a, %r8 !load a newline into %r8ta 1 !output the newline
CIS360 104
SPARC Assembly Language 11
– More assembler directives (.asciz and .ascii): Each of the following two directives is equivalent:
– msg01: .asciz "a phrase"– msg01: .byte 'a', ' ', 'p', 'h', 'r' .byte 'a', 's', 'e', 0
Note that .asciz generates one byte for each character between the quote (") marks in the operand, plus a null byte at the end.
The .ascii directive does not generate that extra byte. Each of the following three directives is equivalent:– digits: .ascii "0123456789"– digits: .byte '0', '1', '2', '3', '4', '5' .byte '6', '7', '8', '9'
– digits: .byte 0x30, 0x31, 0x32, 0x33, 0x34 .byte 0x35, 0x36, 0x37, 0x38, 0x39
CIS360 105
SPARC Assembly Language 12
– Quick review of instructions so far: ld [addr], %rd ! %rd M[addr] st %rd, [addr] ! M[addr] %r2 op %rs1, %rs2, %rd ! op is ALU op op %rs, siconst13, %rd ! %rd%rs op const set siconst32, %rd ! %rdconst ta # ! trap signal
– Have actually seen many more variants, e.g., ldub, ldsb, sth, clr, mov, neg, add, sub, smul, sdiv, umul, udiv, etc. Can evaluate just about any simple arithmetic expression.
CIS360 106
Review: Sparc Loads, Stores .datax: .word 0xa1b2c3d4
.skip 12 .text set x, %r2 ld [%r2], %r3 ldsb [%r2], %r4 ldub [%r2], %r5 st %r3, [%r2+4] sth %r3, [%r2+8] stb %r3, [%r2+12] ta 0
After this runs, what values are in %r2-5, and memory locations starting at byte address x?
CIS360 107
Flow of Control 1 In addition to sequential execution, need ability to
repeatedly and conditionally execute program fragments.– High level language has: while, for, do, repeat, case, if-then-else,
etc.
– Assembler has if, goto.
– Compare: high level vs. pseudo-assembler, implementation of f=n!
f = 1 i = 2loop: if (i > n) goto done f = f * i i = i + 1 goto loopdone: ...
f = 1;i = 2;while (i <= n){ f = f * i; i = i + 1;}
CIS360 108
Flow of Control 2
– Branch -- put a new address in the program counter. Next instruction comes from the new address, effectively, a “goto”.
– Unconditional branch (book) BRANCH addr ! PC addr
(SPARC) ba addr ! PC addr
– Conditional branch (book) BRcc R1, R2, target
“if R1 cc R2 then PC target” and cc is comparison operation (e.g., LT is , GE is , etc.)
CIS360 109
Flow of Control 3– Evaluating conditional
branches Evaluate condition If condition is true, then
PC target, else PC PC+1
– Consider changes to the fetch-execute cycle given earlier for accumulator machine. What needs to change?
O P =B R cc
P C to bus, etc.
O P =B R A N C H
A ddr to bus, loadP C
C ond=T
Y es
N oY es
N o N o
Y es
F etch
E xecute
CIS360 110
Flow of Control 4 Other conditions (from text, very similar to MIPS)
Can implement high level control structures now. Back to the factorial example using the book’s assembly language:
LOAD R1, #1 ; R1 = f = 1LOAD R2, #2 ; R2 = i = 2LOAD R3, n ; R3 = n
loop: BRGT R2, R3, done ; branch if i > n
MPY R1, R1, R2 ; f = f * iADD R2, R2, #1 ; i = i + 1BRANCH loop ; goto loop
done: STORE f, R1 ; f = n!
BRLT Rn, Rm, targetBRLE Rn, Rm, targetBREQ Rn, Rm, targetBRNE Rn, Rm, targetBRGE Rn, Rm, targetBRGT Rn, Rm, target
; if Rn Rm then PCtarget; if Rn Rm then PCtarget; if Rn Rm then PCtarget; if Rn Rm then PCtarget; if Rn Rm then PCtarget; if Rn Rm then PCtarget
CIS360 111
Flow of Control 5
– Condition Codes Book’s assembly language has 3-address branches. SPARC
uses 1-address branches. Must use condition codes. Non-MIPS machines use condition codes to evaluate branches.
Condition Code Register (CCR) holds these bits. SPARC has 4-bit CCR.
N: Negative, Z: Zero, V: Overflow, C: Carry. All are shown in a trace, or in the reg command under ISEM.
Condition codes are not set by normal ALU instruction. Must use special instruction ending with cc, e.g., addcc.
N Z V C
CIS360 112
Flow of Control 6 .textstart: set 1, %r2 set 0xFFFFFFFE, %r1 ! –2 in 32-bit 2’s compcc_set: subcc %r1, %r2, %r3 ! r3<= -2-1end: ta 0
ISEM> reg ----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7---G 00000000 fffffffe 00000001 00000000 00000000 00000000 00000000 00000000O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 PC: 08:00000028 nPC: 0000002c PSR: 0000003e N:0 Z:0 V:0 C:0 cc_set : subcc %g1, %g2, %g3 ISEM> trace ----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7---G 00000000 fffffffe 00000001 fffffffd 00000000 00000000 00000000 00000000O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 PC: 08:0000002c nPC: 00000030 PSR: 00b0003e N:1 Z:0 V:0 C:0
CIS360 113
Flow of Control 7
– Setting the condition codes Regular ALU operations don’t set condition codes. Use addcc, subcc, smulcc, sdivcc, etc., to set condition
codes. E.g., Suppose %r1 contains -4 and %r2 contains 5.
addcc %r1, %r2, %r3subcc %r1, %r2, %r3subcc %r2, %r1, %r3subcc %r1, %r1, %r3
N Z V C
CIS360 114
ALU Hardware 1
How does a computer add?– Design a circuit that adds three single digit binary
numbers. Results in a sum, and a carry out.Cin X Y Sum Cout
0 0 0 0 01 0 0 1 00 0 1 1 01 0 1 0 10 1 0 1 01 1 0 0 10 1 1 0 11 1 1 1 1
1
x y
cout
cin
Sum
FA
x y
cout cin
Sum
CIS360 115
ALU Hardware 2 Now cascade the full adder hardware
How are CCR bits set?– C-bit = Cout – V-bit = Cout Cn-1
– Z-bit = (rz0 rz1 rz2 ... rzn-1)– N-bit = rzn-1
FA 0
register x register y
register z
FAcout FAFA FA
CIS360 116
Flow of Control 8– Branches use logic to evaluate CCR (SPARC)
Operation Assembler Syntax Branch Condition
Branch always ba target 1 (always)
Branch never bn target 0 (never)
Branch not equal bne target Z
Branch equal be target Z
Branch greater bg target (Z (N V))
Branch less or equal ble target (Z (N V))
Branch greater or equal bge target (N V)
Branch less bl target N V
Branch greater, unsigned bgu target (C Z)
Branch less or equal, unsigned bleu target C Z
Branch carry clear bcc target C
Branch carry set bcs target C
Branch positive bpos target N
Branch negative bneg target N
Branch overflow clear bvc target V
Branch overflow set bvs target V
CIS360 117
Flow of Control 9
– Setting Condition Codes (ctd). Synthetic instruction cmp %rs1, %rs2
– Sets CCR, but doesn't modify any registers.
– Implemented as subcc %rs1, %rs2, %g0 Back to the factorial example (SPARC)
set 1, %r1 ! %r1 = f = 1set 2, %r2 ! %r2 = i = 2set n, %r3 ! Get loc of nld [%r3], %r3 ! Put n in %r3
loop: cmp %r2, %r3 ! Set CCR (i?n)bg done ! i > n donenop ! Branch delay
umul %r1, %r2, %r1 ! f = f * iadd %r2, 1, %r2 ! i = i + 1
ba loop ! Goto loopnop ! Branch delay
done: set f, %r3 ! Get loc of fst %r1, [%r3] ! f = n!
CIS360 118
Flow of Control 10
– Branch delay slots: unique to RISC architecture Non-technical explanation: processor is running so fast, it
can’t make a turn. – Instruction following branch is always executed.
Technical explanation: pipelining doesn't permit a decision about a branch taken until after the next instruction enters the pipeline.
Compilers take advantage of branch delay slots by putting a useful instruction there if possible.
For our purposes, use the nop (no operation) instruction to fill branch delay slots. Beware! Forgetting the nop will be a large source of errors in your programs!
CIS360 119
High Level Control Structures 1
Converting high level control structures– You get to be the “compiler”.
Some compilers convert the source language (C, Pascal, Modula 2, etc.) into assembly language and then assemble the result to an object file. GNU C, C++ do this to GAS (Gnu Assembler).
– if-then-else, while-do, repeat-until are all possible to create in a structured way in assembly language.
CIS360 120
High Level Control Structures 2 General guidelines
– Break down into independent logical units– Convert to if/goto pseudo-code.
– Mechanical, step-by-step, non-creative process
f=1 i=2loop: if (i>n) goto done f = f*i i = i+1 goto loopdone: ...
f = 1;
for (i=2; i<=n; i++)
f = f * i;
CIS360 121
High Level Control Structures 3 if-then-else
if (a<b) c = d + 1;else c = 7;
if/goto
if (a >= b) goto elsec = d + 1goto end
else: c = 7 end:
init: set a, %r2 ! get &a into r2 ld [%r2], %r2 ! get a into r2 set b, %r3 ! get &b into r3 ld [%r3], %r3 ! get b into r3if: cmp %r2, %r3 ! a ?? b (want >=) bge else ! a >= b, do then nop set d, %r5 ! get &d into r5 ld [%r5], %r5 ! get d into r5 add %r5, 1, %r4 ! r4 <- d+1 ba end nopelse: set 7, %r4 ! get 7 into r4end: set c, %r5 ! get &c into r5 st %r4, [%r5] ! c <- r4
CIS360 122
High Level Control Structures 4 while loops:
while (a<b) a = a+1;c = d;
if/goto:whle: if (a>=b) goto done
body: a = a+1goto whle
done: c = d
init: set a, %r4 ! get &a into r4 ld [%r4], %r2 ! get a into r2 set b, %r3 ! get &b into r3 ld [%r3], %r3 ! get b into r3whle: cmp %r2, %r3 ! a ?? b (want >=) bge done ! a >= b skip body nopbody: add %r2, 1, %r2 ! r2 = a + 1 st %r2, [%r4] ! a = a + 1 ba whle ! repeat loop body nopdone: set c, %r5 ! get &c into r5 ...
CIS360 123
High Level Control Structures 5 repeat-until loops:repeat …until (a>b)
if/goto:repeat: …
if (a<=b) goto repeat
rpt: ... ... set a, %r2 ; get &a into r2 ld [%r2], %r2 ; get a into r2 set b, %r3 ; get &b into r3 ld [%r3], %r3 ; get b into r3 cmp %r2, %r3 ; a <= b? ble rpt ; do body again
nop
CIS360 124
High Level Control Structures 6 Complex condition
if((a<b)and(b>=c)) …
if((a<b)or(b>=c)) …
These can be combined and used in if/else or while loops.
Primitive Language
if (a>=b) then goto skip if (b<c) then goto skipbody: ... ...skip: ...
Primitive Language
if (a<b) then goto body if (b<c) then goto skipbody: ... ...skip: ...
CIS360 125
Flow of Control 11
– Optimizing code: change order of instructions, combine instructions, take advantage of branch delay slots.
Factorial example again. (for i:=n downto 1 do…)
Reduced 7 instructions in loop to just 4. (You gain no advantage if you optimize code in your labs.)
set 1, %r1 ! %r1=f=1set n, %r2 ! Get loc of nld [%r2], %r2 ! Put n in %r2
loop: umul %r1, %r2, %r1 ! f=f*nsubcc %r2, 1, %r2 ! Decrement nbg loop ! Repeatnop ! Branch delay set f, %r3 ! Get loc of fst %r1, [%r3] ! f=n!
CIS360 126
Synthetic Instructions Remember Lab0? .data
x: .word 0x42
y: .word 0x20
z: .word 0
.text
start:
set x, %r2
ld [%r2], %r2
set y,%r3
ld [%r3], %r3 and so on…
Suppose you gave this command to ISEM (after loading):ISEM> dump start
start 05 00 00 08 84 10 a0 60 c4 00 80 00 07 00 00 08
Could you find the set instruction?
CIS360 127
Instruction Encodings 1 First, Instruction Encoding is how instructions are
assembled– All instructions must fit into 32 bits.
Register-register: op=10, i=0
Register-immediate: op=10, i=1
Floating point: op=10, i=0
op rd op3 rs1 asii rs2
3130 29 25 24 19 18 14 1312 5 4
op rd op3 rs1 simm13i
opf rs2op rd op3 rs1 i
CIS360 128
Instruction Encodings 2 Call instructions: op=01
Branch instructions: op=00, op2=010
SETHI instructions: op=00, op2=100
Ex.: add %r2, %r3, %r4
in hexadecimal: 88008003
op disp30
3130 29
op rd op2 imm22
10 00100 000000 00010 000000000 00011
3130 29 25 24 19 18 14 1312 5 4
condop i
3130 29 28 25
op2
24 22
disp22
21
a
CIS360 129
Understanding SET SyntheticUsually used to put the value of an address in memory into a register.
For example, set 0x4004, %r3 Can do neither ‘add %r0, 0x4004, %r3’ nor ‘or %r0, 0x4004, %r3’. Why not?
SET is a synthetic instruction which may be implemented in two steps.
bit positions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sethi 0x10, %r3 ! Puts 0x10 in the Most Significant 22 bits hex value%r3 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0x124812480x10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 x x x x x x x x x x 0x10sethi%r3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x4000
or %r3, 0x0004, %r3 ! Puts 0x0004 in the least significant bits%r3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x40000x0004 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0x00000004OR%r3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0x4004
#2
#1
sethi 0x10, %r3 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0x 07 00 00 10or %r3, 4, %r3 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0x 86 10 E0 04
Machine language encoding for 'set 0x4004, %r3'
CIS360 130
Decoding an Instruction05 00 00 0816 0000 0101 0000 0000 0000 0000 0000 10002
Instruction Group (bits 30:31) = 00
Destination Register (bits 25:29) = 00010
Op Code (bits 22:24) = 100
Constant (bits 0:21) = 0000000000000000001000
Meaning: sethi 0x8, %r2
%r2 <-- 00000000000000000010000000000000 (0x2000)
CIS360 131
More Decoding
Binary Group OP
Rd Rs1 Rs2 SICONST
84 10 A0 60 1000 0100 0001 0000 1010 0000 0110 0000
C4 00 80 00
07 00 00 08
86 10 E0 64
CIS360 132
SET Synthetic Instruction
set iconst, rdsethi %hi(iconst), rd
or rd, %lo(iconst), rd
--or--
sethi %hi(iconst), rd
--or--
or %g0, iconst, rd
CIS360 133
Bitwise Operations 1
Bit Manipulation Instructions– Bitwise logical operations
and %rs1, %rs2, %rd10010011… (32 bits)
01111001…
or %rs1, %rs2, %rd10010011… (32 bits)
01111001…
xor %rs1, %rs2, %rd10010011… (32 bits)
01111001…
x y xy0 0 00 1 01 0 01 1 1
x y x+y0 0 00 1 11 0 11 1 1
x y xy0 0 00 1 11 0 11 1 0
CIS360 134
Bitwise Operations 2 andn %rs1, %rs2, %rd
10010011… (32 bits)
01111001…
orn %rs1, %rs2, %rd10010011… (32 bits)
01111001…
not %rs, %rd10010011… (32 bits)
Recall the cc operations, so andcc, orcc, etc. are available. (However, there is no notcc; use xnorcc.)
x y xy0 0 00 1 01 0 11 1 0
x y x y0 0 10 1 01 0 11 1 1
x x0 11 0
CIS360 135
Bitwise Operations 3 For what kinds of things are these bit level operations used?
Recall the synthetic operation clr, and mov.clr %r2 or %r0, %r0, %r2mov %r2 %r3 or %r0, %r2, %r3
Masking operations: Want to select a bit or group of bits from a set of 32. E.g., convert lower (or upper) to upper case:
‘a’ in binary is 01100001‘A’ in binary is 01000001
All we need to do is “turn off” the bit in position 5.0xDF in binary is 11011111
and %r1, 0xDF, %r1 will turn off that bit! What if we subtract 32 from %r1? What about converting upper to lower case?
CIS360 136
Bitwise Operations 4– Bitwise shifting operations
Shift logical left: sll %rs1, %rs2, %rd%rs1: data to be shifted%rs2: shift count%rd: destination register
E.g., set 0xABCD1234, %r2sll %r2, 3, %r3
%r2: 1010 1011 1100 1101 0001 0010 0011 0100%r3: 0101 1110 0110 1000 1001 0001 1010 0000
sll is equivalent to multiplying by a power of 2 (barring overflow). (In the decimal system, what’s a shortcut for multiplying by a power of ten?)
CIS360 137
Bitwise Operations 5 Shift Logical Right: srl %rs1, %rs2, %rd
– Shifts right instead of left, inserting zeros. Arithmetic shifts: propagate the sign bit when shifting right,
e.g., sra. (Left shift doesn't change.)– Equivalent to dividing by a power of 2.
Rotating shifts: Bits that would have gone into the bit bucket are shifted in instead. (E.g., rr, rl)
– Rotate not implemented in SPARC
Rotate Right Rotate Left
CIS360 138
More SPARC Assembly Language Assembler directives
Are not encoded as machine instructions Memory alignment: .align 4
– Used when mixing allocations of bytes, words, halfwords, etc. and need word boundary alignment
Reserve bytes of space: .skip 20– Useful for allocating large amounts of space (e.g., arrays)
Create a symbolic constant: .set mask, 0x0f– Can now use the word “mask” anywhere we could use the
constant 0x0f previously
All this is leading to additional addressing modes, which help us work with pointers, arrays, and records in assembly language.
CIS360 139
Addressing Modes 1
Addressing Modes– How do we specify operand values?
In a register, location is encoded in the instruction. As a constant, immediate value is in the instruction. In memory, operand is somewhere in memory, location may
only be known at runtime.
– Memory operands: Effective address: actual location of operand in memory. This
may be calculated implicitly (e.g., by a displacement in the instruction) or may be calculated by the programmer in code.
CIS360 140
Addressing Modes 2
– Summary of addressing modes:
Mode Example Loc. Of Operand Suitable for SPARC?
Immediate add %r1, 100, %r1 instruction Constants Yes
Register Direct add %r1, %r2, %r1 %r2 Integers, constants Yes
Memory Direct add %r1, [2000], %r2 mem[2000] Integers, constants No
Memory Indirect ld [2000], %r1 mem[mem[2000]] Pointers No
Register Indirect ld [%r1], %r2 mem[%r1] Pointers Yes
Register Indexed st %r1, [%r2+%r3] mem[%r2+%r3] Arrays Yes
Register Displaced
st %r1, [%r2+x] mem[%r2+x] Records Yes
Post Increment ld [%r1]+, %r2 mem[%r1] increment %r1
Arrays, strings, stacks
No
Pre Decrement ld -[%r1], %r2 decrement %r1, mem[%r1]
Arrays, strings, stacks
No
CIS360 141
Addressing Modes 3
– Memory Direct addressing Entire address is in the instruction (not in SPARC).
E.g., accumulator machine: each instruction had an opcode and a hard address in memory.
– Can’t be done on SPARC because an address is 32 bits, which is the length of an instruction. No room for opcodes, etc. Can be done in CISC because multi-word instructions are permitted.
– Memory Indirect addressing Pointer to operand is in memory. Instruction specifies location
of pointer. Requires three memory fetches (one each for instruction, pointer, and data). Not in RISC machines because instruction is too slow; such an instruction would cause its own register interlock!
CIS360 142
Addressing Modes 4
– Register Indirect addressing Register has address of operand (a pointer). Instruction
specifies register number, effective address is contents of register.
Ex.:
.datan: .word 5 ; initialize [n] to 5
.textset n, %r1 ; %r1 has
pointer to [n]ld [%r1], %r3 ; fetch [n]
into %r3
CIS360 143
Addressing Modes 5 Ex.: sum up array of integers:
.datan: .word 5 ! Size of arraya: .word 4,2,5,8,3 ! 5 word arraysum: .word 0 ! Sum of elementsb: .skip 5*4 ! another 5 word array
.textclr %r2 ! r2 will hold sumset n, %r3 ! r3 points to [n]ld [%r3], %r3 ! r3 gets array sizeset a, %r4 ! r4 points to array
loop: ld [%r4], %r5 ! Load element of a into r5add %r5, %r2, %r2! sum = sum + elementadd %r4, 4, %r4 ! Incr ptr by word sizesubcc %r3, 1, %r3! Decrement counterbg loop ! Loop until count = 0nop ! Branch delay slotset sum, %r1 ! r1 points to sumst %r2, [%r1] ! Store sumta 0 ! done
0 5 a4 a+43 a+82 a+121 a+16
r2 r3 r4 r5Pre-looplooploop+1loop+2loop+3
5 naa+4a+8a+12a+16sum
42583
CIS360 144
Addressing Modes 6 C-style example of pointer data type
char x; // object of type characterchar * ptr; // pointer to character typeptr = &x; // ptr has address of x (points to x)*ptr = ‘a’; // store ‘a’ at address in ptr
Assembly language equivalent.data
x: .byte 0 ! reserve character space
.align 4 ! align to word boundaryptr: .word 0 ! pointer variable
.textset x, %r1 ! get address x
into %r1set ptr, %r2 ! get address ptr
into %r2st %r1, [%r2] ! make [ptr] point
to [x]set ’a’, %r3 ! put character ‘a’
into r3stb %r3, [%r1] ! store ‘a’ at
address x
xptr‘a’
‘a’Addr of x
X:ptr:
r1r2r3
CIS360 145
Addressing Modes 7
– Register Indexed addressing Suitable for accessing successive elements of the same type in
a data structure. Ex.: Swap elements A[i] and A[k] in array
Effective address calculations!
.data A: .skip 24*4 ! reserve array[0..23] of int
! assume i is in %r2 and k is in %r3 .text set A, %r4 ! beginning of array ptr. sll %r2, 2, %r2 ! “multiply” i by 4 sll %r3, 2, %r3 ! “multiply” k by 4 ld [%r2+%r4], %r7 ! r7 <- a[i] ld [%r3+%r4], %r8 ! r8 <- a[k] st %r8, [%r2+%r4] ! a[i] <- r8 st %r7, [%r3+%r4] ! a[k] <= r7
AA+4A+8A+12
001 0010 A100 1000
r2 r3 r4 r7 r8
after sll<-
CIS360 146
Addressing Modes 8 Simulating Register Indirect addressing on SPARC
– SPARC doesn't truly have register indirect addressing. We can write st %r2, [%r1] but assembler converts this automatically into st %r2, [%r1+%r0]
Array mapping functions: used by compilers to determine addresses of array elements. Must know upper bound, lower bound, and size of elements of array.
– Total storage = (upper - lower + 1)*element_size
– Offset for kth element = (k - lower)*element_size
– Offset for A[3] = (3-0)*4 = 12
– This is for 1 dimensional arrays only!
CIS360 147
Addressing Modes 9 1D array mapping functions: Want an array of n elements,
each element is 4 bytes in size, array starts at address arr.– Total storage is 4n bytes
– First element is at arr+0
– Last element is at arr+4(n-1)
– kth (k can range from 0…n-1) element is at arr+4k. Array uses zero-based indexing.
k=0 k=1 k=2 k=3 k=4 k=5
ar r +0 ar r +4 ar r +8 ar r +12 ar r +16 ar r +20
ar r ay of 6 elements, 4 bytes each
CIS360 148
Addressing Modes 10 2D array mapping functions: must linearize the 2D concept;
e.g., map the 2D structure into 1D memory.
– Convert into 1D array in memory
0,0 0,1 0,2 0,3 0,4
1,0 1,1 1,2 1,3 1,4
2,0 2,1 2,2 2,3 2,4
0 1 2 3 4
0
1
2
3 R ows(0...2)
5 C olumns (0...4)
0,0 0,1 0,2 0,3 0,4 1,0 1,1 2,3 2,4.....
CIS360 149
Addressing Modes 11 2 ways to convert to 1D
– Row major order (Pascal, C, Modula-2) stores first by rows, then by columns. E.g.,
– Column major order (FORTRAN) stores first by columns then by rows. E.g.,
– Row major 2D array mapping function: Given an array starting at address arr that is x rows by y columns, each element is m bytes in size, and indices start at zero, then element (i, j) may be found at location: arr + (y i + j) m
0,0 0,1 0,2 0,3 0,4 1,0 1,1 2,3 2,4.....
0,0 1,0 2,0 0,1 1,1 2,1 0,2 1,4 2,4.....
CIS360 150
Addressing Modes 12 3D array mapping function: natural extension of 2D function.
Store by row, then column, then depth.
– Array starting at arr with x rows, y columns, depth z, m element size. Element (i, j, k) is found at location:
arr + (zyi + j) + k)m
0,0,1 0,1,1 0,2,1 0,3,1 0,4,1
1,0,1 1,1,1 1,2,1 1,3,1 1,4,1
2,0,1 2,1,1 2,2,1 2,3,1 2,4,1
0,0,0 0,1,0 0,2,0 0,3,0 0,4,0
0,1,0 1,1,0 1,2,0 1,3,0 1,4,0
2,0,0 2,1,0 2,2,0 2,3,0 2,4,0
3 R ows, 5 C olumns, 2 D epth
1,0,0
+0
+1
+2 +4 +6 +8
+3 +5 +7 +9
+10
+12
+14
+16
+18
CIS360 151
Addressing Modes 13CALCULATE:total storageoffset for A(i,j,k)address for A(i,j,k)
1D 2D 3Delement size (#bytes) 4 2 1# rows (x) 7 3 3# cols (y) 1 5 5# depth (z) 1 1 2starting addr (0) 4 8 12i= 1 1 0j= 0 1 1k= 0 0 1
CIS360 152
Addressing Modes 14! Example that adds 1 to every element of columns 1 and 2, not 0, of a 5 by 3 array
.data
.set rows, 5 ! define symbolic constants
.set cols, 3arr: .skip rows * cols * 4 ! allocate space (.skip 60 same)
.text...
set arr, %r3 ! get address of arrayclr %r1 ! %r1 is i (row)
loop1: cmp %r1, rows ! done if i >= rowsbge donenopclr %r2 ! %r2 is j (col),inc %r2 ! start at one (skip col zero)
loop2: cmp %r2, cols ! if at last column, done with rowbge inc1nopsmul %r1, cols, %r4 ! # elements to skip for current rowadd %r4, %r2, %r4 ! then which column being accessedsmul %r4, 4, %r4 ! change from element to byte offsetld [%r3+%r4], %r5 ! get arr[i][j]add %r5, 1, %r5 ! increment valuest %r5, [%r3+%r4] ! store it back to arr[i][j]
inc2: add %r2, 1, %r2 ! next columnba loop2 ! continue inner loop over columnsnop
inc1: add %r1, 1, %r1 ! next rowba loop1 ! continue outer loop over rowsnop
done: ...
CIS360 153
Addressing Modes 15
– Displacement Addressing Suitable for accessing the individual fields of record data
structures. Each field can be of a different type.
Use .set directive to establish offsets to fields within records. Then use displacement addressing to access those fields.
20 C har acter s
I nteger
I nteger
N ame
A ge
D O B
L ogicalview of a
r ecor d
20 bytes 4 bytes 4 bytes
A ctual layout of r ecor d in memor y
per son+0 per son+20 per son+24
CIS360 154
Addressing Modes 16 Ex.: Add 1 to the age field in a person record
Problem: alignment in memory. May have to waste some space in the person record in order to have the integer fields align on a word boundary.
.data .set name, 0 ! offset to name field .set age, 20 ! offset to age field .set dob, 24 ! offset to date of birthperson: .skip 28 ! size of a person record
.text.... set person, %r1 ! get addr of person record ld [%r1+age], %r2 ! get the age of the person add %r2, 1, %r2 ! increment age by 1 st %r2, [%r1+age] ! store back to record
CIS360 155
Addressing Modes 17
– Auto-increment and Auto-decrement addressing SPARC does not support these modes. They may be
simulated using register indirect addressing followed by an add or subtract of the size of the element on that register.
Useful for traversing arrays forward (auto-increment) and backward (auto-decrement). Also useful for stacks and queues of data elements.
CIS360 156
Subroutines 1
– Subroutines and subroutine linkage Subroutines: programming mechanism to facilitate repeated
computations and modularization.
– Use of subroutines Basis for structured and disciplined programming Compact code (no need to write monolithic loops) Relatively easy to debug (no cut-and-paste errors) Requires little hardware support, mostly protocols and
conventions to handle parameters.
CIS360 157
Subroutines 2
– Terminology Caller: the code (which could be a subroutine itself) which
invokes the subroutine of interest Callee: the subroutine being invoked by the caller Function: subroutine that returns one or more values back to
the caller and exactly one of these values is distinguished as the return value
Return value: the distinguished value returned by a function
CIS360 158
Subroutines 3
– Terminology (continued) Procedure: a subroutine that may return values to the caller
(through the subroutine’s parameter(s)), but none of these values is distinguished as the return value
Return address: address of the subroutine call instruction Parameters: information passed to/from a subroutine (a.k.a.
arguments) Subroutine linkage: a protocol for passing parameters between
the caller and the callee
CIS360 159
Subroutines 4– Subroutine linkage
Calling a subroutine– Assembly language syntax for calling a subroutine
call labelnop
– Must change the program counter (as in a branch instruction) however, we must also keep track of where to resume execution after the subroutine finishes. Call instruction handles this atomically (i.e., without interruption) by:
%r15 PCnPC label
Returning from a subroutine– Assembly language syntax for returning from a subroutine
retlnop
CIS360 160
Subroutines 5 Returning from a subroutine (continued)
– Again, must change the program counter to return to an instruction after the one that called the subroutine. The address of the instruction that called it was saved in %r15, and we must skip over the branch delay slot as well. So, this is accomplished by:nPC %r15+8
Parameter passing: 2 approaches– Register based linkage: pass parameters solely through registers.
Has the advantage of speed, but can only pass a few parameters, and it won’t support nested subroutine calls. Such a subroutine is called a leaf subroutine.
– Stack based linkage: pass parameters through the run-time stack. Not as fast, but can pass more parameters and have nested subroutine calls (including recursion).
CIS360 161
Register-based Linkage 1– Register based linkage:
Startup Sequence: load parameters and return address into registers, branch to subroutine.
Prologue: if non-leaf procedure then save return address to memory, save registers used by callee.
Epilogue: place return parameters into registers, restore registers saved in prologue, restore saved return address, return.
Cleanup Sequence: work with returned values
S tar tupS equence
C leanupS equence
P r ologue
B ody
E pilogue
C aller C allee
call
r et l
CIS360 162
Register-based Linkage 2– Example: Print subroutine.
.textmain: set 1, %r1 ! Initialize r1 and r2
set 3, %r2mov %r1, %r8 ! Print %r1call printnopmov %r2, %r8 ! Print %r2call printnopadd %r1, %r2, %r8 ! Do our calculationcall print ! Print the result (expect
‘4’)nopta 0
print: set ‘0’, %r1 ! Ascii value of zeroadd %r8, %r1, %r2 ! Treat r8 as parametermov %r2, %r8 ! Move into output registerta 1 ! Output charactermov ‘\n’, %r8ta 1 ! Output end of line
(newline)retl ! Returnnop
What’s wrong with the above code?
CIS360 163
Register-based Linkage 3– Which registers can subroutines change?
Convention for optimized leaf procedures:
Any other registers the subroutine touches must be saved to memory somewhere, and restored before returning to the caller.
Problem: how can a subroutine call another subroutine? How can a subroutine call itself?
Register(s) Use Changeable? %r0 Zero No %r1 Temporary Yes %r2-%r7 Caller’s variables No %r8 Return value Yes %r8-%r13 Parameters Yes %r14 Stack pointer No %r15 Return address No %r30 Frame pointer No %r16-%r29, %r31 Caller’s variables No
CIS360 164
Register-based Linkage 4
– Example: procedure to print linked list of ints.
. dat a . set dt a, 0 ! off set i n r ecor d t o dat a . set pt r , 4 ! off set i n r ecor d t o next poi nt erhead: . wor d 0
. t extmai n: . . . . ! does al l i ni t and al l ocat i on of l i st set head, %r 8 ! pr epar e par amet er t o t r aver se pr oc l d [ %r 8] , %r 8 ! f ol l ow head poi nt er t o fi r st node cal l t r av ! cal l subr out i ne nop ! br anch del ay . . . .
t r av: mov %r 8, %r 1 ! copy poi nt er t o %r 1l oop: cmp %r 1, 0 ! check f or nul l poi nt er be done ! nul l poi nt er means we ar e done nop ! br anch del ay l d [ %r 1+dt a] , %r 8 ! f ol l ow poi nt er and get dat a fi el d t a 4 ! pr i nt dat a fi el d l d [ %r 1+pt r ] , %r 1 ! get poi nt er t o next r ecor d ba l oop nop ! br anch del aydone: r et l
5 7 4 1 nilhead
nop
CIS360 165
Parameter Passing 1
– Parameter passing review: Pass by value: parameters to subroutine are copies upon which
the subroutine acts. Pass by reference: parameters to subroutine are addresses of
values upon which the subroutine acts. Callee is responsible for saving each result to memory at the location referred to by the appropriate parameter.
Hybrid: some parameters passed by value, and some by reference. Callee is responsible for saving results for reference parameters.
CIS360 166
Parameter Passing 2– Parameter passing notes:
Array or record parameters typically are passed by reference (efficiency reasons). Primitive data types may be passed either way.
Conventions among languages allows any language to call functions in any other language:
– Pascal: VAR parameters are passed by reference; all others are passed by value.
– C: all parameters are passed by value. Must explicitly pass a pointer if you want a reference parameter.
– C++: like Pascal, can pass by value or by reference.
– FORTRAN: all things passed by reference (even constants).
– ADA: pass by value/result.
CIS360 167
Parameter Passing 3 .text ! pp. 72-73 of Lab Manual! pr_str – print a null terminated string! Parameters: %r8 – pointer to string (initially)!! Temporaries: %r8 – the character to be printed! %r9 – pointer to string!pr_str: mov %r8, %r9 ! we need %r8 for the “ta 1” belowpr_lp: ldub [%r9], %r8 ! load character cmp %r8, 0 ! check for null be pr_dn nop ta 1 ! print character ba pr_lp inc %r9 ! increment the pointer ! (in a branch delay slot!)pr_dn: retl nop
CIS360 168
Parameter Passing 4 Summary from text (p. 220)
– Pass by value: For small “in” parameters. Subroutines cannot alter the originals whose copies are passed as parameters.
– Pass by value/result: For small “in/out” parameters. Caller’s cleanup sequence stores values of any “in/out” parameters.
– Pass by reference: for “in/out” parameters of all sizes, and large “in” parameters. “Out” values are provided by changing memory at those addresses. (Note: pass by reference is passing an address by value).
CIS360 169
Parameter Passing 5
– Write Sparc code for the caller and callee for the following subroutine using register based parameter passing
! global_function Integer Subchr (A, B, C)! Substitutes character C for all B in string [A],! and returns count of changes.! ! // In comments, "[A+index]" is denoted by "ch".! index = 0! count = 0! LOOP: if [A+index]=0 go to END // while (ch != 0) { ! if [A+index]B go to INC // if (ch == B) {! [A+index]=C // ch = C;! count=count+1 // count++; }! INC: index=index+1 // index++;! go to LOOP // }! END:
.data ! data sectionC_s: .byte ’I’ ! parameter CB_s: .byte ’i’ ! parameter BA_s: .asciz "i will tip" ! parameter A .align 4R_s: .word 0 ! for storing result count
Assume
CIS360 170
Stack-based Linkage 1 Stack based linkage
– Advantages Allows a larger number of parameters to be passed. Permits records and arrays to be passed by value. Saving of registers by callee is “built-in”. A way for callee to reserve memory for other uses is “built-in”, too.
– Disadvantages Slower than register based More complex protocol
– Why a stack? Subroutine calls and returns happen in a last-in first-out order (LIFO).
Also known as a runtime stack, parameter stack, or subroutine stack.
CIS360 171
Stack-based Linkage 2 Items “saved” on the stack
in one activation record– Parameters to the
subroutine
– Registers used in the subroutine
– Local memory variables used in subroutine
– Return value and return address
Say A() calls B(), B() calls C(), and C() calls A()
1st stackfr ame for A
1st stackfr ame for B
1st stackfr ame for C
2nd stackfr ame for A
L ocal var iables
S aved gener al pur poser egister s
R etur n addr esses
R etur n values
P ar ameter s
R unt ime S tack E xpanded V iew
CIS360 172
Stack-based Linkage 3– Stack based linkage parameter passing
convention Startup sequence:
– Push parameters– Push space for return value
Prologue– Push registers that are changed
(including return address)– Allocate space for local variables
Epilogue– Restore general purpose registers– Free local variable space– Use return address to return
Cleanup Sequence– Pop and save return values– Pop parameters
S tar tupS equence
C leanupS equence
P r ologue
B ody
E pilogue
C aller C allee
call
r et l
CIS360 173
Stack-based Linkage 4
– Stack based parameter passing example: Register %r14 %sp stack pointer
– Invariant: Always indicates the top of the stack (it has the address in memory of the last item on stack, usually a word).
– Moved when items are “pushed” onto the stack.
– Due to interruptions (system interrupts (I/O) and exceptions), values stored above %sp (at addresses less than %sp) can change at any time! Hence, any access above %sp is unsafe!
Register %r30 %fp frame pointer– Indicates the previous stack pointer. Activation record is from
(some subroutine-specific number of words before) the %fp to the %sp.
– Invariant: %fp is constant within a subroutine (after prologue).
CIS360 174
Stack-based Linkage 5
– Stack based parameter passing example: Want to implement the following subroutine:
! global_function Integer Subchr (A, B, C)! Substitutes character C for all B in string A,! and returns count of changes.! ! // In comments, "*(A+index)" is denoted by "ch".! index = 0! count = 0! LOOP: if *(A+index)=0 go to END // while (ch != 0) { ! if *(A+index)B go to INC // if (ch == B) {! *(A+index)=C // ch = C;! count=count+1 // count++; }! INC: index=index+1 // index++;! go to LOOP // }! END:
.data ! data sectionC_s: .byte ’I’ ! parameter CB_s: .byte ’i’ ! parameter BA_s: .asciz "i will tip" ! parameter A .align 4R_s: .word 0 ! for storing result count
CIS360 175
Stack-based Linkage 6 .data ! data sectionC_s: .word ’I’ ! parameter CB_s: .word ’i’ ! parameter BA_s: .asciz "i will tip" ! parameter A .align 4 ! align to word addressstack: .skip 250*4 ! allocate 250 word stackbstak: ! point to bottom of stackR_s: .word 0 ! reserve for count .text! Program’s one-time initializationstart: set bstak, %sp ! set initial stack ptr
mov %sp, %fp ! set initial frame ptr! STARTUP SEQUENCE to call subchr()
sub %sp, 16, %sp ! move stack ptr set A_s, %r1 ! A is passed by reference
st %r1, [%sp+4] ! push address on stack set B_s, %r1 ! B is passed by value ld [%r1], %r1 ! get value of B st %r1, [%sp+8] ! push parameter B on stack set C_s, %r1 ! C is passed by value ld [%r1], %r1 ! get value of C st %r1, [%sp+12] ! push parameter C on stack
! SUBROUTINE CALL call subchr ! make subroutine call nop ! branch delay slot! CLEANUP SEQUENCE ld [%sp], %r1 ! pop return value off stack
add %sp, 16, %sp ! pop stack set R_s, %r2 ! get address of R st %r1, [%r2] ! store R . . . ! the rest of the program
Return value
b
stack:
%sp ->
%fp ->
addr (a)
c
CIS360 176
Stack-based Linkage 7! SUBROUTINE PROLOGUEsubchr: sub %sp, 32, %sp ! open 8 words on stack
st %fp, [%sp+28] ! Save old frame pointer add %sp, 32, %fp ! old sp is new fp st %r15, [%fp-8] ! save return address
st %r8, [%fp-12] ! Save gen. Register … ! Save r9-r13, omitted
! SUBROUTINE BODYld_reg: ld [%fp+4], %r8 ! “pop” (load) addr of A
ld [%fp+8], %r9 ! “pop” (load) value of B ld [%fp+12], %r10 ! “pop” (load) value of C clr %r12 ! count clr %r13 ! index
loop: ldub [%r8+%r13], %r11 ! load a string chr cmp %r11, 0x0 ! is chr=null? be done ! then go to done cmp %r11, %r9 ! is chr<>B? (branch delay) bne inc ! then go to inc nop ! branch delay slot stb %r10, [%r8+%r13] ! change chr to C add %r12, 1, %r12 ! increment count
inc: add %r13, 1, %r13 ! increment index ba loop ! do next chr nop ! branch delay slot
done: st %r12, [%fp+0] ! “push” (store) count on stack
! EPILOGUE … ! Restore r9-r13, omitted ld [%fp-12], %r8 ! Restore r8 ld [%fp-8], %r15 ! get saved return address
ld [%fp-4], %fp ! Get old value of frame ptr add %sp, 32, %sp ! Restore stack pointer retl ! return to caller nop ! branch delay slot
cb
addr (a)
%sp ->
%fp ->
return addrold frame ptrReturn value
...%r9%r8
CIS360 177
Stack-based Linkage 8
General Guidelines
– Keep Startups, Cleanups, Prologues, and Epilogues standard (but
not necessarily identical); easy to cut, paste, and modify.
– Caller: leave space for return value on the TOP of the stack.
– Callee: always save and restore locally used registers (except
%r1).
– Pass data structures and arrays by reference, all others by value
(efficiency).
CIS360 178
Traps and Exceptions 1
Traps and Exceptions
– Other side of low level programming -- the interface
between applications and peripherals
– OS provides access and protocols
CIS360 179
Traps and Exceptions 2
– BIOS: Basic Input/Output System Subroutines that control I/O No need for you to write them as application programmer OS interfaces application with BIOS through traps (extended
operations (XOPs))
B I O S
K eyboar d S cr een M ouse D isk
A pplicat ionssoftwar e
CIS360 180
Traps and Exceptions 3– Where are OS traps kept? Two approaches:
Transient monitor: traps kept in a library that is copied into the application at link-time
Resident monitor: always keep OS in main memory; applications share the trap routines.
OS routines monitor devices. Frequently used routines kept resident; others loaded as needed.
O S r tns
A ppl 1
O S r tns
A ppl 2
O S r tns
A ppl 3
O S r tns
A ppl 4
O S r tns
A ppl 1
A ppl 2
A ppl 3
A ppl 4
A ppl 5
A ppl 6
CIS360 181
D ispatcherA pplicat ion
B I O S 1
B I O S 1
B I O S n
Traps and Exceptions 4
– (Assuming a res. monitor) How to find I/O routines? Store routines in memory, and make a call to a hard address.
E.g., call 256– When new OS is released, need to recompile all application
programs to use different addresses. Use a dispatcher
– Dispatcher is a subroutine that takes a parameter (the trap number). Dispatcher knows where all routines actually are in memory, and makes the branch for you. Dispatcher subroutine must always exist in the same location.
2
CIS360 182
Traps and Exceptions 5 Use vectored linking
– Branch table exists at a well known location. The address of each trap subroutine is stored in the table, indexed by the trap number.
– On RISC, usually about 4 words reserved in the table. If the trap routine is larger than 4 words, can branch to the actual routine.
A ddr of t r ap 0
A ddr of t r ap 1
A ddr of t r ap 2
A ddr of t r ap n
100
104
108
100+4n
100
116
132
100+16n
CIS360 183
Traps and Exceptions 6
– Levels of privilege Supervisor mode - can access every resource User mode - limited access to resources OS routines operate in supervisor mode, access is determined
by bit in PSW (processor status word). XOP (book’s notation) can always be executed, sets privilege
to supervisor mode (ta) RTX (book’s notation) can only be executed by the OS, and
returns privilege to user mode (rett)
– Exceptions Caused by invalid use of resource. E.g., divide by zero,
invalid address, illegal operation, protection violation, etc.
CIS360 184
Traps and Exceptions 7 Control transferred automatically to exception handler routine.
Similar to trap or XOP transfer. Exceptions vs. XOPs
– XOPs explicit in code, exceptions are implicit
– XOPs service request and return to application; exceptions print message and abort (unless masked).
– Trap example: non-blocking read ta 3 If there is nothing in the keyboard buffer, return with a
message that nothing is there. Otherwise, put the character into register 8.
CIS360 185
Traps and Exceptions 8 Status of the keyboard is kept in a memory location, as is the
(one-character) keyboard buffer. Memory mapped devices.
On SPARC, trap table has 256 entries. 0-127 are reserved for exceptions and external interrupts. 128-255 are used for XOPs. Trap table begins at address 0x0000. Each entry is 4 instructions (16 bytes) long.
! ta 3 returns character if one is there, otherwise! it returns 0x8000000 into %r8 set 0x8000000, %r8 ! set default return val set KbdStatus, %r1 ! KbdStatus is memory loc ld [%r1], %r1 ! read status (1 is ready) andcc %r1, 1, %r1 ! check status be rtn ! can’t read anything set KbdBuff, %r1 ! KbdBuff is memory loc ld [%r1], %r8 ! get characterrtn: rett ! return to caller
CIS360 186
Traps and Exceptions 9 Trap execution: ta 3
– Calculate trap address: 3 * 16 + 0x0800 = 16 * (3 + 0x080)
– Save nPC and PSW to memory• SPARC uses register windows
• Assumes local registers are available
– Set privilege level to supervisor mode
– Update PC with trap address (and make nPC = PC + 4) (jumps to trap table)
– Trap table has instruction ba ta3_handler– rett
• Restores PC (from saved nPC value) and PSW (resets to user mode)
• Returns to application program
CIS360 187
Programmed I/O 1
Programmed I/O – Early approach: Isolated I/O
Special instructions to do input and output, using two operands: a register and an I/O address.
CPU puts device address on address bus, and issues an I/O instruction to load from or store to the device.
CIS360 188
Programmed I/O 2
C P U
M emor y
I /O
addr bus
data bus
r ead/wr ite
addr bus
data bus
r ead/wr ite
Isolated I/O
CIS360 189
Memory Mapped I/O No special instructions. Treat the I/O device like a memory
address. Hardware checks to see if the memory address is in the I/O device range, and makes the adjustment.
Use high addresses (not “real” memory) for I/O memory maps. E.g., 0xFFFF0000 through 0xFFFFFFFF.
CPU
Memory
I/O
addr bus
data bus
read/write
memor y
unused
I /O
unused
CIS360 190
Programmed I/O 3
– Advantages of each Memory mapped: reduced instruction set, reduced redundancy
in hardware. Isolated: don’t have to give up memory address space on
machines with little memory
CIS360 191
Programmed I/O - UARTs UARTs
– Universal Asynchronous Receiver Transmitter
– Asynchronous = not on the same clock.
– Handshake coordinates communication between two devices.
– A kind of programmed I/O.
Keyboard UART
0110 CPU..0
01101010serial
parallel
CIS360 192
UARTs 1 UART registers
– Control: set up at init, speed, parity, etc.
– Status: transmit empty, receive ready, etc.
– Transmit: output data– Receive: input data– All four needed for bi-
directional communications, – Status/control, transmit /
receive often combined. Why?
Control Reg
Status Reg
Transmit Reg
Receive Reg
TransmitLogic
ReceiveLogic
Control bus
Address bus
Data bus
CIS360 193
UARTs 2 Memory mapped UARTs
– Both memory and I/O “listen” to the address bus. The appropriate device will act based on the addresses.
– Keyboards and Printers require three addresses (when addresses are not combined).
– Modems require four.– (why?)
UART 1 data
UART 1 status
UART 1 control
UART 2 xmit
UART 2 recv
UART 2 status
UART 2 control
UART 3 xmit
FFFF 0000
FFFF 0004
FFFF 0008
FFFF 000C
FFFF 0010
FFFF 0014
FFFF 0018
FFFF 001C
CPUMemory UART1 UART2
Control busAddress bus
Data bus
and so on
CIS360 194
Programmed I/O 4
Programmed I/O Characteristics:– Used to determine if device is ready (can it be read or
written).
– Each device has a status register in addition to the data register.
– Like previous trap example, must check status before getting data.
– Involves polling loops.
CIS360 195
Programmed I/O – PollingEx.: ta 2 handler (blocking keyboard input)
Can’t afford to wait like this. Computer is millions of times faster than a typist. Also, multi-tasking operating systems can’t wait.
Special purpose computers can wait. E.g., microwave oven controllers.
Must have a better way! Interrupts are the answer!
ta_2_handler: set KbdBuff, %r1 ! get addr of kbd buffer set KbdStatus, %r9 ! get addr of kbd statuswait: ld [%r9], %r10 ! get status andcc %r10, 1, %r10 ! check if ready be wait ! loop until ready nop ! branch delay ld [%r1], %r8 ! get data rett ! return from trap
Are you ready?...Are you ready
now?...How about NOW?...
Nope ..Not yet..Hang on..
CIS360 196
Interrupts and DMA transfers 1
Programmed (polled) I/O used busy waiting.– Advantages: simpler hardware
– Disadvantages: wastes time
Interrupts (IRQs on PCs)– I/O device “requests” service from CPU.
– CPU can execute program code until interrupted. Solves busy waiting problems.
– Interrupt handlers are run (like traps) whenever an interrupt occurs. Current application program is suspended.
CIS360 197
Interrupts and DMA transfers 2 Servicing an interrupt
– I/O controller generates interrupt, sets request line “high”.
– CPU detects interrupt at beginning of fetch/execute cycle (for interrupts “between” instructions).
– CPU saves state of running program, invokes intrpt. handler.
– Handler services request; sets the request line “low”.
– Control is returned to the application program.
Application Program::*Interrupt Detected*::
InterruptHandlerService Request::ClearInterrupt
CIS360 198
Interrupts and DMA transfers 3 Changes to fetch/execute cycle Problems
– Requires additional hardware in Timing & Control.
– Queuing of interrupts
– Interrupting an interrupt handler (solution: priorities and maskable interrupts)
– Interrupts that must be serviced within an instruction
– How to find address of interrupt handler
Interrupt Pending?
Save PCSave PSW
PSW=new PSWPC=handler_addr
PC -> busload MARINC to PCload PC
Y N
CIS360 199
Interrupts and DMA transfers 4
Example: interrupt driven string output– Want to print a string without busy waiting.– Want to return to the application as fast as
possibleI’m
ready!
CIS360 200
Trap handler implementation Install trap handler into trap table
– Buffer is like circular queue
– only outputs, at most, one character
disp_buf: .skip 256 ! buffers string to print
disp_frnt: .byte 0 ! offset to front of queue
disp_bck: .byte 0 ! offset to back of queue
ta_6_handler:
! Copy str from mem[%r8] to mem[disp_buf+disp_bck]
! Disp_back = (disp_back+len(str)) mod 256
! If display is ready
! If first char is not null, then output it
! Disp_frnt = (disp_frnt+1) mod 256
rett ! Return from trap
Disp_buf:
disp_frnt
disp_bck
newest
byte
Undisplayed
byte
Oldest
byte
CIS360 201
Interrupt handler implementation
This too outputs only one character at most, but when display becomes ready again, it generates another interrupt which invokes this routine!
display_IRQ_handler:
! Save any registers used
! If disp_frnt != disp_bck (queue is not empty)
! Get char at mem[disp_frnt]
! If char is not null, then output it
! Disp_frnt = (disp_frnt+1) mod 256
! Restore registers and set the request line “low”
rett ! Return from trap
Uses the UART for transmission.
I’m ready!
CPU
Memory
CIS360 202
Interrupts and DMA transfers 5 Problems with interrupt driven I/O
CPU is involved with each interrupt Each interrupt corresponds to transfer of a single byte Lots of overhead for large amounts of data (blocks of 512 bytes)
Memory CPU Device Controller
Execute 10s or 100sof instructions per byte
Transfer oneword of data
InterruptTransfer one byte of data
CIS360 203
Interrupts and DMA transfers 6 DMA (Direct Memory Access)
Want I/O without CPU intervention Want larger than one byte data transfers Solution: add a new device that can talk to both I/O devices
and memory without the CPU; a “specialized” CPU strictly for data transfers.
Memory
CPU
Device Controller
DMA Controller
CIS360 204
Interrupts and DMA transfers 7 Steps to a DMA transfer
– CPU specifies a memory address, the operation (read/write), byte count, and disk block location to the DMA controller.
– DMA controller initiates the I/O, and transfers the data to/from memory directly
– DMA controller interrupts the CPU when the entire block transfer is completed.
Problem– Conflicts accessing memory. Can either arbitrate
access or get a more expensive dual ported memory system.