Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 255 times |
Download: | 0 times |
William Sandqvist [email protected]
1.1 The C-function
We should document our code. You can find a flowchart tool in
Word or Powerpoint.
This could be useful for lab reports.
int fac_c(int x)
x < = 0
f = 0 f = 1
x > 1
f = f * x
x = x-1
End
return f
if
else N
Y
Y
Nwhile
int fac_c(int x){ int f; if(x <= 0) f = 0; else { f = 1; while(x > 1) { f = f * x; x--; } } return f;}
fac_c(5)
calculates
1*5*4*3*2*1=120
Flowchart
William Sandqvist [email protected]
main in C#include<stdio.h>extern int fac_asm(int);int fac_c(int);int main(void){ int c_result, asm_result; int x; while(1) { printf(”Enter a number: ”); scanf(”%d”, &x); c_result = fac_c(x); asm_result = fac_asm(x); printf(”C-result: %d\n”, c_result); printf(”Asm-result: %d\n”,asm_result); } return 0;}
Message to the linker:fac_asm() is an external function (from an other file).
William Sandqvist [email protected]
Structure diagram?To document the program structure, a structure diagram could be useful. It could be directly translated into structured programming. ( while, if, else … )
But in assembler, we are not interested in the program structure, but in the program flow.
William Sandqvist [email protected]
The Flowchart
The flowchart could be directly translated to assembler code.
William Sandqvist [email protected]
How to program the Nios processor?
The Nios processor is the Altera version of a MIPS processor.
It is designed to make efficient use of the resources in a FPGA.
It comes in three versions: Small – Medium – Large …
William Sandqvist [email protected]
Nios II registers 0…15
Use as constant ”0”!
If you call a subroutine, save the contents of the registers you’ve used on stack!
William Sandqvist [email protected]
Program constants, I-type instructions
Some pseudoinstructions:movi rB, IMMED addi rB,r0,IMMED movia rB,label orhi rB,r0,%hiadj(label) addi rB,r0,%lo(label)
William Sandqvist [email protected]
I-type, Branch
Pseudoinstruction:
blebranch if less than or equal signed
bge is the ble with register A and B swapped!
The IMM16 adress is effectively a 18 Byte-adress because instructions must be word-aligned.
William Sandqvist [email protected]
Conditional operators of C
All C-language conditional operators have assembly instructions (or pseudoinstructions).
Compare two registers and branch relative if the expression is true.
William Sandqvist [email protected]
Assembler.global fac_asm.text# Parameter in r4 (and if needed in r5, r6, r7)# Return value in r2 (and r3 if long or double)# we can use r2 and r3 for calculations until return# r8 … r15 must be saved by caller of a sub
fac_asm:# int r2 fac_asm(int r4 x), the function prototype# r3 : for constant ”1”if: ble r4, r0, else # if(x <= 0) movi r3, 1 # constant ”1” mov r2,r3 # f = 1while: ble r4,r3, endsub # while(x>1){ mul r2,r2,r4 # f = f*x sub r4,r4,r3 # x = x - 1 br while # }else: mov r2, r0 # f = 0endsub: ret # return r2.end
fac_asm has to be made known to other files
William Sandqvist [email protected]
2.2 Input/Output
R/W reverses the direction of the databuss.
CS Chip Select enables the chip
Connect a 8 register memory-mapped peripheral to the CPU. The CPU has 8 bit address and data busses.
The peripheral should have registeraddresses 0x10…0x17.
William Sandqvist [email protected]
Decode - doorlock
How to open the doorlock?
Press 4 (d) and 8 (h) simultaneously but don’t press any other key!
William Sandqvist [email protected]
Connections
0x10 = 00010.000
0x11 = 00010.001
0x12 = 00010.010
0x13 = 00010.011
0x14 = 00010.100
0x15 = 00010.101
0x16 = 00010.1100x17 = 00010.111
34567 AAAAACS
CS RS2RS1RS0
Decoder
William Sandqvist [email protected]
3.2 Hitrate and accesstime
a) tAVG = 8 ns h = ?
h is hitrate. b) tAVG = 15 ns h = ?
c) tAVG = 6 ns h = ?
William Sandqvist [email protected]
Hitrate calculations
MMCMCAVG )()1( ttthththt
CM
AVGM
tt
tth
954,0570
870)a
h 846,0570
1570)b
h
985,0570
670)c
h
tAVG 8, 15, 6 ns
William Sandqvist [email protected]
3.1 Memory system
Direct addressmapping:
Memory-line: i Cache-line: j = i % K
The memory is Byte-organized, but we could draw it as if it was organized in Memory-lines with the same size as the Cache-line.
This will simplify all figures.
In this example. The Blocktransfer is Cache-line of 2 words.
William Sandqvist [email protected]
Why Blocktransfer?
• To transfer 1 ”random” word in memory takes three buscykles 3TBus/word ( 2 TBUS are Waitstates)
• To transfer a ”Burst” of 2 words takes 3+1 buscykles, 4/2 = 2TBus/word
• To transfer a ”Burst” of 4 words takes 3+1+1+1 buscykler, 6/4 = 1,5TBus/word
• To transfer a ”Burst” of 8 words takes 3+1+1+1+1+1+1+1 buscykles, 10/8 = 1,25TBus/wordRemember, to make these gains, you must have use for most of the transfered words – otherwise blocktransfer could be even slower than random transfer!This is just an example. Other accesspatterns exists, eg. 5+3+3+3 and so on. The busclock is derived from the processorclock, perhaps TBUS = 10*TCPU.
”1 word” 3TBus/word
”2 words” (3+1)/2 = 2TBus/word
”4 words” (3+1+1+1)/4 = 1.5TBus/word
William Sandqvist [email protected]
Mapping of memory addressMemory 4kB 4*210 = 212 Bytes. Memory address: mmmmmmmmmmmm
Cache 8 Word, 8*32 Bytes. Cache-line 2 Word, 2*4 Byte. Cache-address: ll.w.bb
Memory – Cache mapping:mmmmmmm.mm.m.mmttttttt.ll.w.bb
Adress in Cache is irrespective of tag-bits!
Our example: Data-adresses are acessed four times in this order:0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C
The Adress Tag
William Sandqvist [email protected]
Memory and Cache
Data is acessed from three different locations (Tags), but they will map to the same lines in this small cache!
William Sandqvist [email protected]
Direct mapped CacheMemory-address mem-location
Tag (#) Cache.ll.w.bb
Line#(Tag#)
0x0100000000.10.0.00
0000000. (0) .10.0.00 2(0)
0x1FC0001111.11.1.00
0001111. (1) .11.1.00 3(1)
0x1680001011.01.0.00
0001011. (2) .01.0.00 1(2)
0x0080000000.01.0.00
0000000. (0) .01.0.00 1(0)
0x0140000000.10.1.00
0000000. (0) .10.1.00 2(0)
0x1F80001111.11.0.00
0001111. (1) .11.0.00 3(1)
0x00C0001111.01.1.00
0001111. (1) .01.1.00 1(1)
William Sandqvist [email protected]
Program executionData-adresses are acessed four times in this order:0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C
C, ColdMiss = line entry to a previously unused cache memory (This counts as a Miss)
M, Miss = the previous line entry was from an other location (tag)
H, Hit = the previous line entry was from the same location (tag)
Cache access, line#(tag#):
2(0)3(1)1(2)1(0)2(0)3(1)1(1)2(0)3(1)1(2)1(0)2(0)3(1)1(1) 2(0)3(1)1(2)1(0)2(0)3(1)1(1) 2(0)3(1)1(2)1(0)2(0)3(1)1(1)
CCCMHHMHHMMHHMHHMMHHMHHMMHHM
5,07
4
7
243
41 h
William Sandqvist [email protected]
2-way set associative cache
Memory address: mmmmmmmm.m.m.mm
Address mapping: tttttttt.l.w.bb
OBSERVE! The set number is not included in the address map. Logic circuits within the associtive cache takes care of the set number and connects the CPU with the correct set.
( Tags are stored in associative cache for each line in every set. All sets are searched in parallell for tag. )
William Sandqvist [email protected]
Example of how an associative cache can boost performance
Memory: 0x010, Tag: 0x01 Cache: 0x0=0b0.0.00Memory: 0x1FC, Tag: 0x1F Cache: 0xC=0b1.1.00Memory: 0x168, Tag: 0x16 Cache: 0x8=0b1.0.00Memory: 0x008, Tag: 0x00 Cache: 0x8=0b1.0.00Memory: 0x014, Tag: 0x01 Cache: 0x4=0b0.1.00 Memory: 0x1F8, Tag: 0x1F Cache: 0x8=0b1.0.00Memory: 0x00C, Tag: 0x00 Cache: 0xC=0b1.1.00
( Nice example. The Cache part is one full hex digit.)
William Sandqvist [email protected]
Fewer conflict missesMemory locations 0x010, 0x014 are stored in cache-line 0 – But there are two sets! Both can be stored simultaneously.
0x1FC, 0x168, 0x008, 0x1F8, 0x00C are stored in cache-line 1, Two of them could be stored simultaneously.
You have to consider the exchange policy in order to be able to analyse this example in full detail. (Not given).
Exchange policy: FIFO, RANDOM, LRU …
If the exchange policy were known, we could follow the cache accesses for every step to calculate hitrate: line,set(tag) line,set(tag) …