+ All Categories
Home > Documents > William Sandqvist [email protected] Exercises Embedded Systems.

William Sandqvist [email protected] Exercises Embedded Systems.

Date post: 21-Dec-2015
Category:
View: 255 times
Download: 0 times
Share this document with a friend
Popular Tags:
37
William Sandqvist willia [email protected] Exercises Embedded Systems
Transcript
Page 1: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Exercises Embedded Systems

Page 2: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

1.1 The C-function

We should document our code. You can find a flowchart tool in

Word or Powerpoint.

This could be useful for lab reports.

int fac_c(int x)

x < = 0

f = 0 f = 1

x > 1

f = f * x

x = x-1

End

return f

if

else N

Y

Y

Nwhile

int fac_c(int x){ int f; if(x <= 0) f = 0; else { f = 1; while(x > 1) { f = f * x; x--; } } return f;}

fac_c(5)

calculates

1*5*4*3*2*1=120

Flowchart

Page 3: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

main in C#include<stdio.h>extern int fac_asm(int);int fac_c(int);int main(void){ int c_result, asm_result; int x; while(1) { printf(”Enter a number: ”); scanf(”%d”, &x); c_result = fac_c(x); asm_result = fac_asm(x); printf(”C-result: %d\n”, c_result); printf(”Asm-result: %d\n”,asm_result); } return 0;}

Message to the linker:fac_asm() is an external function (from an other file).

Page 4: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Structure diagram?To document the program structure, a structure diagram could be useful. It could be directly translated into structured programming. ( while, if, else … )

But in assembler, we are not interested in the program structure, but in the program flow.

Page 5: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

The Flowchart

The flowchart could be directly translated to assembler code.

Page 6: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

How to program the Nios processor?

The Nios processor is the Altera version of a MIPS processor.

It is designed to make efficient use of the resources in a FPGA.

It comes in three versions: Small – Medium – Large …

Page 7: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Nios II registers 0…15

Use as constant ”0”!

If you call a subroutine, save the contents of the registers you’ve used on stack!

Page 8: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Nios II registers 16…31

Points to the stack!

Page 9: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Register operations, R-type instructions

Page 10: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Program constants, I-type instructions

Some pseudoinstructions:movi rB, IMMED addi rB,r0,IMMED movia rB,label orhi rB,r0,%hiadj(label) addi rB,r0,%lo(label)

Page 11: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

I-type, Branch

Pseudoinstruction:

blebranch if less than or equal signed

bge is the ble with register A and B swapped!

The IMM16 adress is effectively a 18 Byte-adress because instructions must be word-aligned.

Page 12: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Conditional operators of C

All C-language conditional operators have assembly instructions (or pseudoinstructions).

Compare two registers and branch relative if the expression is true.

Page 13: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Memory content, Load and StoreStore in memory …

stw r6, 100(rA)

Page 14: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

The call and ret instructions

Page 15: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

From Flowchart to assembler

Page 16: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Assembler.global fac_asm.text# Parameter in r4 (and if needed in r5, r6, r7)# Return value in r2 (and r3 if long or double)# we can use r2 and r3 for calculations until return# r8 … r15 must be saved by caller of a sub

fac_asm:# int r2 fac_asm(int r4 x), the function prototype# r3 : for constant ”1”if: ble r4, r0, else # if(x <= 0) movi r3, 1 # constant ”1” mov r2,r3 # f = 1while: ble r4,r3, endsub # while(x>1){ mul r2,r2,r4 # f = f*x sub r4,r4,r3 # x = x - 1 br while # }else: mov r2, r0 # f = 0endsub: ret # return r2.end

fac_asm has to be made known to other files

Page 17: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Exercises Embedded Systems

Page 18: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

2.1 Prioritized interrupts

Page 19: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Exercises Embedded Systems

Page 20: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

2.2 Input/Output

R/W reverses the direction of the databuss.

CS Chip Select enables the chip

Connect a 8 register memory-mapped peripheral to the CPU. The CPU has 8 bit address and data busses.

The peripheral should have registeraddresses 0x10…0x17.

Page 21: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Decode - doorlock

How to open the doorlock?

Press 4 (d) and 8 (h) simultaneously but don’t press any other key!

Page 22: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Connections

0x10 = 00010.000

0x11 = 00010.001

0x12 = 00010.010

0x13 = 00010.011

0x14 = 00010.100

0x15 = 00010.101

0x16 = 00010.1100x17 = 00010.111

34567 AAAAACS

CS RS2RS1RS0

Decoder

Page 23: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Why memory cache?

Page 24: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Exercises Embedded Systems

Page 25: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

3.2 Hitrate and accesstime

a) tAVG = 8 ns h = ?

h is hitrate. b) tAVG = 15 ns h = ?

c) tAVG = 6 ns h = ?

Page 26: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Hitrate calculations

MMCMCAVG )()1( ttthththt

CM

AVGM

tt

tth

954,0570

870)a

h 846,0570

1570)b

h

985,0570

670)c

h

tAVG 8, 15, 6 ns

Page 27: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Exercises Embedded Systems

Page 28: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Exercises Embedded Systems

Page 29: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

3.1 Memory system

Direct addressmapping:

Memory-line: i Cache-line: j = i % K

The memory is Byte-organized, but we could draw it as if it was organized in Memory-lines with the same size as the Cache-line.

This will simplify all figures.

In this example. The Blocktransfer is Cache-line of 2 words.

Page 30: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Why Blocktransfer?

• To transfer 1 ”random” word in memory takes three buscykles 3TBus/word ( 2 TBUS are Waitstates)

• To transfer a ”Burst” of 2 words takes 3+1 buscykles, 4/2 = 2TBus/word

• To transfer a ”Burst” of 4 words takes 3+1+1+1 buscykler, 6/4 = 1,5TBus/word

• To transfer a ”Burst” of 8 words takes 3+1+1+1+1+1+1+1 buscykles, 10/8 = 1,25TBus/wordRemember, to make these gains, you must have use for most of the transfered words – otherwise blocktransfer could be even slower than random transfer!This is just an example. Other accesspatterns exists, eg. 5+3+3+3 and so on. The busclock is derived from the processorclock, perhaps TBUS = 10*TCPU.

”1 word” 3TBus/word

”2 words” (3+1)/2 = 2TBus/word

”4 words” (3+1+1+1)/4 = 1.5TBus/word

Page 31: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Mapping of memory addressMemory 4kB 4*210 = 212 Bytes. Memory address: mmmmmmmmmmmm

Cache 8 Word, 8*32 Bytes. Cache-line 2 Word, 2*4 Byte. Cache-address: ll.w.bb

Memory – Cache mapping:mmmmmmm.mm.m.mmttttttt.ll.w.bb

Adress in Cache is irrespective of tag-bits!

Our example: Data-adresses are acessed four times in this order:0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C

The Adress Tag

Page 32: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Memory and Cache

Data is acessed from three different locations (Tags), but they will map to the same lines in this small cache!

Page 33: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Direct mapped CacheMemory-address mem-location

Tag (#) Cache.ll.w.bb

Line#(Tag#)

0x0100000000.10.0.00

0000000. (0) .10.0.00 2(0)

0x1FC0001111.11.1.00

0001111. (1) .11.1.00 3(1)

0x1680001011.01.0.00

0001011. (2) .01.0.00 1(2)

0x0080000000.01.0.00

0000000. (0) .01.0.00 1(0)

0x0140000000.10.1.00

0000000. (0) .10.1.00 2(0)

0x1F80001111.11.0.00

0001111. (1) .11.0.00 3(1)

0x00C0001111.01.1.00

0001111. (1) .01.1.00 1(1)

Page 34: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Program executionData-adresses are acessed four times in this order:0x010, 0x1FC, 0x168, 0x008, 0x014, 0x1F8, 0x00C

C, ColdMiss = line entry to a previously unused cache memory (This counts as a Miss)

M, Miss = the previous line entry was from an other location (tag)

H, Hit = the previous line entry was from the same location (tag)

Cache access, line#(tag#):

2(0)3(1)1(2)1(0)2(0)3(1)1(1)2(0)3(1)1(2)1(0)2(0)3(1)1(1) 2(0)3(1)1(2)1(0)2(0)3(1)1(1) 2(0)3(1)1(2)1(0)2(0)3(1)1(1)

CCCMHHMHHMMHHMHHMMHHMHHMMHHM

5,07

4

7

243

41 h

Page 35: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

2-way set associative cache

Memory address: mmmmmmmm.m.m.mm

Address mapping: tttttttt.l.w.bb

OBSERVE! The set number is not included in the address map. Logic circuits within the associtive cache takes care of the set number and connects the CPU with the correct set.

( Tags are stored in associative cache for each line in every set. All sets are searched in parallell for tag. )

Page 36: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Example of how an associative cache can boost performance

Memory: 0x010, Tag: 0x01 Cache: 0x0=0b0.0.00Memory: 0x1FC, Tag: 0x1F Cache: 0xC=0b1.1.00Memory: 0x168, Tag: 0x16 Cache: 0x8=0b1.0.00Memory: 0x008, Tag: 0x00 Cache: 0x8=0b1.0.00Memory: 0x014, Tag: 0x01 Cache: 0x4=0b0.1.00 Memory: 0x1F8, Tag: 0x1F Cache: 0x8=0b1.0.00Memory: 0x00C, Tag: 0x00 Cache: 0xC=0b1.1.00

( Nice example. The Cache part is one full hex digit.)

Page 37: William Sandqvist william@kth.se Exercises Embedded Systems.

William Sandqvist [email protected]

Fewer conflict missesMemory locations 0x010, 0x014 are stored in cache-line 0 – But there are two sets! Both can be stored simultaneously.

0x1FC, 0x168, 0x008, 0x1F8, 0x00C are stored in cache-line 1, Two of them could be stored simultaneously.

You have to consider the exchange policy in order to be able to analyse this example in full detail. (Not given).

Exchange policy: FIFO, RANDOM, LRU …

If the exchange policy were known, we could follow the cache accesses for every step to calculate hitrate: line,set(tag) line,set(tag) …


Recommended