The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

The Memory Hierarchy

21/05/2009 Lecture 32_CA&O_Engr Umbreen Sabir

Translation-Look-aside Buffer (TLB) To optimize the translation process and reduce memory access time TLB is a cache that holds recently used page table mappings. TLB tags hold the virtual page number and its data holds the

corresponding physical page number. TLB also holds the reference bit, valid bit and dirty bit TLB miss - page in page table loaded by the CPU - much more frequent or Page not in page table - page fault exception In case of a miss the CPU selects which entry in the TLB needs to be replaced. Its reference and dirty bits are then written back into the page table. Miss rates for the TLB are 0.01-1% penalty is 10-100 clock cycles much smaller than page fault!


Example: Consider a virtual memory system with 40-bit virtual byte

address, 16 KB page and 36-bit physical byte address. What is the total size of the page table for each process on this

machine, assuming that the valid, protection, dirty and use bits take a total of 4 bits and that all the virtual pages are in use?

Assume that disk addresses are not stored on the page table.

Page table size = #entries entry size The #entries = # pages in virtual address = 240 bytes = 16 103 bytes/page = 240 = 226 entries 24 210 The width of each entry is 4+ 36 = 40 bits Thus the size of the page table is 226 40 = 5 226 bytes= 335

MB 23


TLB and cache working together (Intrinsity FastMATH Proc.)

4 KB pages, TLB - 16 entries, fully associative - all need to be compared. Each entry is 64-bits 20 tag bits (virtual page #) 20 data bits (physical page #) valid, ref and dirty bits, etc. One of the extra bits is a write access bit. Prevents programs from writing into pages for which they have only read access - part of protection mechanism. There could be three misses - cache miss, TLB miss and page fault. A TLB miss in this case takes 16 cycles on average.CPU saves process state then gives control of the CPU to another process, then brings page from disk.



How are TLB misses and Page Faults handled?

TLB miss – no entry in TLB matches the virtual address. In that case, if the page is in memory (as indicated by the page table) then that address is placed in the TLB. So the TLB miss is handled by the OS in software. Once the TLB has the virtual address in, then the instruction that caused the TLB miss is re-executed. If the valid bit of the retrieved page address in the TLB is 0, then a page fault When a page fault occurs, the OS takes control and stores the states of the process that caused the page fault, as well as the address of the instruction that caused the page fault in the EPC.


How are TLB misses and Page Faults handled? The OS then finds a place for the page by discarding an old one (if it was dirty it first has to be saved on disk) After that the OS starts the transfer of the needed page from hard disk and gives control of the CPU to another process (millions of cycles). Once the page was transferred, then the OS reads the EPC and returns control to the offending process so it can complete. Also, if that instruction that caused the page fault was a sw, the write control line for the data memory is de-asserted to prevent the sw from completing. When an exception occurs, the processor sets a bit that disable exceptions, so that a subsequent exception will not overwrite the EPC.


The influence of Block size In general, larger block size take advantage of spatial locality BUT: Larger block size means larger miss penalty - Takes longer time to

fill up the block If block size is too big relative to cache size, miss rate will go up

Too few cache blocks In general, Average Access Time = Hit Time (1 - Miss Rate) +

Miss Penalty Miss Rate

MissPenalty

Block Size

MissRate Exploits Spatial Locality

Fewer blocks: compromisestemporal locality

Block Size

AverageAccess

Time

Increased Miss Penalty& Miss Rate

Block Size


The Influence of Associativity Every change that improves the miss rate can also

negatively affect overall performance Ex. We can reduce the miss rate by increasing

associativity (30% gain for small caches going from direct-mapped to two-way associative).

But large associativity does not make sense for modern caches which are large, since hardware costs more (more comparators) and the access time is larger.

While for cache full associativity does not pay, for paged memory it is good because misses are very expensive. Large page size means that Page Table is small.


The influence of associativity (SPEC2000)

Small caches

Large caches


Memory writes options There are two options: write-through (for cache) and write-

back (for paged memory). During write-back pages are written to disk only if they were

modified prior to being replaced. The advantages of write-back are that multiple writes to a given

page require only one write to the disk, and using high bandwidth, not one word-at-a-time.

Individual words can be written in a page much faster (cache rate) than if they were written-through to disk.

The advantage of write-through is that misses are simpler to handle and easier to implement (using write buffer).

In the future more caches will use write-back because of the CPU-Memory gap.


Processor-DRAM Memory Gap (latency)

Solutions to reduce the gap:-L3 cache- Have the L2, L3 caches do something while idle

7% DRAM annual performance improvement


Sources of (Cache) Misses Compulsory (cold start or process migration, first

reference): first access to a block“Cold” fact of life: not a whole lot you can do about itNote: If you are going to run “billions” of instruction,

Compulsory Misses are insignificant Conflict (collision): Multiple memory locations

(blocks) mapped to the same cache locationSolution 1: increase cache sizeSolution 2: increase associativity

Capacity: Cache cannot contain all blocks accessed by the program

Solution: increase cache size Invalidation: other process (e.g., I/O) updates

memory 21/05/2009 Lecture 32_CA&O_Engr Umbreen Sabir

Additional conflict misses when going from two-way to one-way associative cache

Additional conflict misses when going from four-way to two-way associative cache

Total Misses Rate vs. Cache type and size

Capacity misses reduce for larger caches


Design alternatives

Increase cache sizeDecreases capacity missesMay increase access time

Increase associativityDecreases conflict miss rateMay increase access time

Increase block size: Decreases miss rate due to spatial localityBut increased miss penaltyVery large blocks may increase miss rate for small

cachesSo design of memory hierarchies is interesting


Processor-DRAM Memory Gap for Multi-cores

Cores

Performance degradation for memory intensive applications


Date post:	01-Jan-2016
Category:	Documents
Upload:	reginald-boyd
View:	214 times
Download:	0 times

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

Documents