CS 162Discussion Section
Week 5 10/7 – 10/11
Today’s Section● Project discussion (5 min)● Quiz (5 min)● Lecture Review (20 min)● Worksheet and Discussion (20 min)
Project 1● Autograder is still up
submit proj1-test
● Due 10/8 (Today!) at 11:59 PMsubmit proj1-code
● Due 10/9 (Tomorrow!) at 11:59 PMFinal design doc & Project 1 Group EvalsTemplate posted on Piazza last weekWill post Group Evals Link on Piazza Wed Afternoon
● Questions?
Quiz…
Short Answer[Use this one]1. Name the 4 types of cache misses discussed in class, given their causes: [0.5 each]a) Program initialization, etc. (nothing you can do about them) [Compulsory Misses]b) Two addresses map to the same cache line [Conflict Misses]c) The cache size is too small [Capacity Misses]d) External processor or I/O interference [Coherence Misses][Choose 1]2. Which is better when a small number of items are modified frequently: write-back
caching or write-through caching? [Write-back]3. Name one of the two types of locality discussed in lecture that can benefit from
some type of caching. [Temporal or Spatial]
True/False[Choose 2]3. Memory is typically allocated in finer grained units with segmentation than with
paging. [False]4. TLB lookups can be performed in parallel with data cache lookups [True]5. The size of an inverted page table is proportional to the number of pages in virtual
memory [False]6. Conflict misses are possible in a 3-way-set-associative cache [True]
Lecture Review
9.710/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Virtualizing Resources
• Physical Reality: Processes/Threads share the same hardware– Need to multiplex CPU (CPU Scheduling)– Need to multiplex use of Memory (Today)
• Why worry about memory multiplexing?– The complete working state of a process and/or kernel is defined
by its data in memory (and registers)– Consequently, cannot just let different processes use the same
memory– Probably don’t want different processes to even have access to
each other’s memory (protection)
9.810/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Important Aspects of Memory Multiplexing
• Controlled overlap:– Processes should not collide in physical memory– Conversely, would like the ability to share memory when desired
(for communication)
• Protection:– Prevent access to private memory of other processes
» Different pages of memory can be given special behavior (Read Only, Invisible to user programs, etc)
» Kernel data protected from User programs
• Translation: – Ability to translate accesses from one address space (virtual) to
a different one (physical)– When translation exists, process uses virtual addresses,
physical memory uses physical addresses
9.910/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Two Views of Memory
• Address Space:– All the addresses and state a process can touch– Each process and kernel has different address space
• Consequently, two views of memory:– View from the CPU (what program sees, virtual memory)– View from memory (physical memory)– Translation box (MMU) converts between the two views
• Translation helps to implement protection– If task A cannot even gain access to task B’s data, no way for A
to adversely affect B• With translation, every program can be linked/loaded into
same region of user address space
PhysicalAddressesCPU MMU
VirtualAddresses
Untranslated read or write
9.1010/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Address Segmentation
1111 1111
heap
code
data
Virtual memory view Physical memory view
data
heap
0000 0000
0100 0000(0x40)
1000 0000(0x80)
1100 0000(0xC0)
seg # offset
code
0000 0000
0001 0000(0x10)
0101 0000(0x50)
0111 0000(0x70)
1110 0000(0xE0)
Seg # base limit11 1011 0000 1 000010 0111 0000 1 100001 0101 0000 10 000000 0001 0000 10 0000
stack1111 0000(0xF0) stack
1011 0000 + 11 0000---------------1110 0000
9.1110/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Address Segmentation
1111 1111stack
heap
code
data
Virtual memory view Physical memory view
data
heap
stack
0000 0000
0100 0000
1000 0000
1100 0000
1110 0000
seg # offset
code
0000 00000001 0000
0101 0000
0111 0000
1110 0000
Seg # base limit11 1011 0000 1 000010 0111 0000 1 100001 0101 0000 10 000000 0001 0000 10 0000
What happens if stack grows to 1110 0000?
9.1210/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Address Segmentation
1111 1111stack
heap
code
data
Virtual memory view Physical memory view
data
heap
stack
0000 0000
0100 0000
1000 0000
1100 0000
1110 0000
seg # offset
code
0000 00000001 0000
0101 0000
0111 0000
1110 0000
Seg # base limit11 1011 0000 1 000010 0111 0000 1 100001 0101 0000 10 000000 0001 0000 10 0000
No room to grow!! Buffer overflow error orresize segment and move segments around to make room
9.1310/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Paging
1111 1111 stack
heap
code
data
Virtual memory view
0000 0000
0100 0000
1000 0000
1100 0000
1111 0000
page # offset
Physical memory view
data
code
heap
stack
0000 00000001 0000
0101 000
0111 000
1110 0000
11111 1110111110 1110011101 null 11100 null 11011 null11010 null11001 null11000 null10111 null10110 null10101 null10100 null10011 null10010 1000010001 0111110000 0111001111 null01110 null 01101 null01100 null01011 01101 01010 01100 01001 0101101000 0101000111 null00110 null00101 null 00100 null 00011 0010100010 0010000001 0001100000 00010
Page Table1110 1111
9.1410/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Paging
1111 1111stack
heap
code
data
Virtual memory view
0000 0000
0100 0000
1000 0000
1100 0000
page # offset
Physical memory view
data
code
heap
stack
0000 00000001 0000
0101 000
0111 000
1110 0000
11111 1110111110 1110011101 null 11100 null 11011 null11010 null11001 null11000 null10111 null10110 null10101 null10100 null10011 null10010 1000010001 0111110000 0111001111 null01110 null 01101 null01100 null01011 01101 01010 01100 01001 0101101000 0101000111 null00110 null00101 null 00100 null 00011 0010100010 0010000001 0001100000 00010
Page Table
1110 0000
What happens if stack grows to 1110 0000?
9.1510/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
stack
Paging
1111 1111stack
heap
code
data
Virtual memory view
0000 0000
0100 0000
1000 0000
1100 0000
page # offset
Physical memory view
data
code
heap
stack
11111 1110111110 1110011101 1011111100 1011011011 null11010 null11001 null11000 null10111 null10110 null10101 null10100 null10011 null10010 1000010001 0111110000 0111001111 null01110 null01101 null01100 null01011 01101 01010 01100 01001 0101101000 0101000111 null00110 null00101 null 00100 null 00011 0010100010 0010000001 0001100000 00010
Page Table
0000 00000001 0000
0101 000
0111 000
1110 0000
Allocate new pages where room!
Challenge: Table size equal to # of pages in virtual memory!
1110 0000
9.1610/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
stack
Two-Level Paging1111 1111
stack
heap
code
data
Virtual memory view
0000 0000
0100 0000
1000 0000
1100 0000
page1 # offset
Physical memory view
data
code
heap
stack
0000 00000001 0000
0101 000
0111 000
1110 0000
page2 #
111 110 null101 null100 011 null010 001 null000
11 11101
10 1110001 1011100 10110
11 01101
10 0110001 0101100 01010
11 00101
10 0010001 0001100 00010
11 null 10 1000001 0111100 01110
Page Tables(level 2)
Page Table(level 1)
1111 0000
9.1710/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
stack
Two-Level Paging
stack
heap
code
data
Virtual memory view
1001 0000(0x90)
Physical memory view
data
code
heap
stack
0000 00000001 0000
1000 0000(0x80)
1110 0000
111 110 null101 null100 011 null010 001 null000
11 11101
10 1110001 1011100 10110
11 01101
10 0110001 0101100 01010
11 00101
10 0010001 0001100 00010
11 null 10 1000001 0111100 01110
Page Tables(level 2)
Page Table(level 1)
9.1810/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Inverted Table1111 1111
stack
heap
code
data
Virtual memory view
0000 0000
0100 0000
1000 0000
1100 0000
page # offset
Inverted Tablehash(virt. page #) =
phys. page #
1110 0000
h(11111) =h(11110) =h(11101) = h(11100) = h(10010)= h(10001)= h(10000)=h(01011)= h(01010)= h(01001)= h(01000)= h(00011)= h(00010)= h(00001)= h(00000)=
stack
Physical memory view
data
code
heap
stack
0000 00000001 0000
0101 0000
0111 0000
1110 0000
111011110010111 1011010000011110111001101 011000101101010 00101 00100 00011 00010
1011 0000
Total size of page table ≈ number of pages used by program in physical memory.
Hash more complex
9.1910/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Address Translation ComparisonAdvantages Disadvantages
Segmentation Fast context switching: Segment mapping maintained by CPU
External fragmentation
Paging (single-level page)
No external fragmentation, fast easy allocation
Large table size ~ virtual memory
Paged segmentation
Table size ~ # of pages in virtual memory, fast easy allocation
Multiple memory references per page access Two-level
pagesInverted Table Table size ~ # of
pages in physical memory
Hash function more complex
9.2010/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Caching Concept
• Cache: a repository for copies that can be accessed more quickly than the original– Make frequent case fast and infrequent case less dominant
• Caching at different levels– Can cache: memory locations, address translations, pages, file
blocks, file names, network routes, etc…• Only good if:
– Frequent case frequent enough and– Infrequent case not too expensive
• Important measure: Average Access time = (Hit Rate x Hit Time) + (Miss Rate x Miss Time)
9.2110/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Why Does Caching Help? Locality!
• Temporal Locality (Locality in Time):– Keep recently accessed data items closer to processor
• Spatial Locality (Locality in Space):– Move contiguous blocks to the upper levels
Address Space0 2n - 1
Probabilityof reference
Lower LevelMemoryUpper Level
MemoryTo Processor
From ProcessorBlk X
Blk Y
9.2210/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
• Compulsory (cold start): first reference to a block– “Cold” fact of life: not a whole lot you can do about it– Note: When running “billions” of instruction, Compulsory Misses
are insignificant• Capacity:
– Cache cannot contain all blocks access by the program– Solution: increase cache size
• Conflict (collision):– Multiple memory locations mapped to same cache location– Solutions: increase cache size, or increase associativity
• Two others:– Coherence (Invalidation): other process (e.g., I/O) updates
memory – Policy: Due to non-optimal replacement policy
Sources of Cache Misses
9.2310/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
• Example: Block 12 placed in 8 block cache
0 1 2 3 4 5 6 7Blockno.
Direct mapped:block 12 (01100) can go only into block 4 (12 mod 8)
Set associative:block 12 can go anywhere in set 0
0 1 2 3 4 5 6 7Blockno.
Set0
Set1
Set2
Set3
Fully associative:block 12 can go anywhere
0 1 2 3 4 5 6 7Blockno.
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
32-Block Address Space:
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3Blockno.
Where does a Block Get Placed in a Cache?
01 100
tag index
011 00
tag index
01100
tag
9.2410/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
• Easy for Direct Mapped: Only one possibility• Set Associative or Fully Associative:
– Random– LRU (Least Recently Used)
2-way 4-way 8-waySize LRU Random LRU Random LRU Random16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%
Which block should be replaced on a miss?
9.2510/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
• Write through: The information is written both to the block in the cache and to the block in the lower-level memory
• Write back: The information is written only to the block in the cache. – Modified cache block is written to main memory only when it is
replaced– Question is block clean or dirty?
• Pros and Cons of each?– WT:
» PRO: read misses cannot result in writes» CON: processor held up on writes unless writes buffered
– WB: » PRO: repeated writes not sent to DRAM
processor not held up on writes» CON: More complex
Read miss may require writeback of dirty data
What happens on a write?
9.2610/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Caching Applied to Address Translation
• Question is one of page locality: does it exist?– Instruction accesses spend a lot of time on the same page
(since accesses sequential)– Stack accesses have definite locality of reference– Data accesses have less page locality, but still some…
• Can we have a TLB hierarchy?– Sure: multiple levels at different sizes/speeds
Data Read or Write(untranslated)
CPU PhysicalMemory
TLB
Translate(MMU)
No
VirtualAddress
PhysicalAddress
YesCached?
Save
Result
9.2710/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Overlapping TLB & Cache Access (1/2)• Main idea:
– Offset in virtual address exactly covers the “cache index” and “byte select”
– Thus can select the cached byte(s) in parallel to perform address translation
OffsetVirtual Page #
indextag / page # byte
virtual address
physical address
9.2810/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Putting Everything Together: Address Translation
Physical Address:
OffsetPhysicalPage #
Virtual Address:OffsetVirtual
P2 indexVirtualP1 index
PageTablePtr
Page Table (1st level)
Page Table (2nd level)
Physical Memory:
9.2910/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Page Table (2nd level)
PageTablePtr
Page Table (1st level)
Putting Everything Together: TLB
OffsetPhysicalPage #
Virtual Address:OffsetVirtual
P2 indexVirtualP1 index
Physical Memory:
Physical Address:
…
TLB:
9.3010/2/2013 Anthony D. Joseph and John Canny CS162 ©UCB Fall 2013
Page Table (2nd level)
PageTablePtr
Page Table (1st level)
Virtual Address:OffsetVirtual
P2 indexVirtualP1 index
…
TLB:
Putting Everything Together: Cache
Offset
Physical Memory:
Physical Address:PhysicalPage #
…
tag: block:cache:
index bytetag
Worksheet…