Date post: | 02-Jul-2015 |
Category: |
Documents |
Upload: | flashdomain |
View: | 377 times |
Download: | 4 times |
Energy-Efficient Virtual Memory System Design for SSD
Chia-Lin YangEmbedded Computing Lab
Department of Computer Science and Information EngineeringNational Taiwan University
Embedded Computing Lab
Low-Power 3D graphics processing unit (GPU) design
Power gating and DVS strategies at the architectural level
PPT: Joint Power/Performance/Thermal management of DRAMs in multi-core systems
Orchestrating thread scheduling and page allocationsRARE: Resource-Aware Runtime Environment for thread scheduling in multi-core systemsEnergy-efficient flash memory system
Embedded System Complexity Is Increasing
GPS
Personal information management
Video conference
Operating System for Supporting Multi-tasking & Virtual Memory System
Main Memory
Page Table01101001
Operating System
TaskTaskTaskTaskTaskTaskSecondary storage
Disk as Storage
Traditional virtual memory system assumes disk as the
secondary storage
Hard Disk
Main Memory
Page Table
Disk Buffer
01101001
Flash Memory as Storage
Flash memory has become popular storages in mobile devices
• low power• light weight• shock resistant
Flash memory has very different characteristics from disk
Write onceOut-place updateGarbage collection
Flash Memory as the Secondary Storage ..
Main MemoryMain Memory
Page Table
Flash MemoryFlash Memory
01101001
1. The need to revisit virtual memory system design2. Energy-efficiency is the main design concern
Energy-Aware Flash Memory Management in Virtual Memory System , IEEE transaction on VLSI
An Energy-Efficient Virtual Memory System with Flash Memory as the Secondary Storage, islped’07
Outline
Motivation Background on Flash Memory Interplay between VM and FM Proposed Energy-Efficient VM Design
SubpagingHotCacheDuplication-Aware Garbage Collection
Experimental ResultsConclusions
Flash Storage System Architecture
Physicaldevice
FTL layer
Command translationMTD layer
Logic Block Address
Physical address
Flash MemoryFlash Memory
(0, 0, 3)1 (0, 1, 2)2 (1, 2, 1)
LBA Physical address(bank, block, page)
……
address translation table
0GarbageCollection
……
Block 0Block 0
Block 1Block 1
Block 2Block 2
Block 3Block 3
Erase one Erase one blockblock
……
Read/Write Read/Write one pageone page
1 Page
Organization of a Typical NAND Flash Memory
Samsung K9F1208R0B 1 Block = 32 pages1 Page = 512B
Flash Memory Characteristics
Write onceWritten page can not be overwritten
Flash Memory
Flash block
A free pageA free page
Flash Memory Characteristics
Write onceWritten page can not be overwritten
A
Flash Memory
A live pageA live pageA free pageA free page
Live page Flash block
Flash Memory Characteristics
Write onceWritten page can not be overwritten
Out-place update
A’
Flash Memory
A live pageA live pageA dead pageA dead page
A free pageA free page
Dead page
New data
Flash block
Flash Memory Characteristics (cont’d)
When # of free pages <= GCt (Garbage Collection Threshold)
Trigger Garbage collection to reclaim dead pagesVia erase operations Basic unit of erase operations is a block
Flash Memory
Flash block
A live pageA live pageA dead pageA dead page
A free pageA free page
Flash Memory Characteristics (cont’d)
Garbage collection to reclaim dead pagesLive data copyingBlock erase
Flash Memory
Flash block
A live pageA live pageA dead pageA dead page
A free pageA free page
Flash Memory Characteristics (cont’d)
Garbage collection overheadsLive data copyingBlock erase
Flash Memory
Flash block
A live pageA live pageA dead pageA dead page
A free pageA free page
Writes are Problematic
Writes consume more energy than readsFrequent writes result in dead pages on flash memory
Trigger frequent garbage collections
Operation Latency EnergyRead (page) 47.2 ns 679 nJWrite (page) 533 us 7.66 mJErase (block) 3 ms 43.21 mJ
Key Design Principles for Energy-Efficient Flash Memory
Reduce writes to flash memoryEfficient garbage collection
Block X
Block Y
A live pageA live pageAn invalided pageAn invalided page
11 writes, gain 5 free pages
recycle block x
recycle block y
2 writes, gain 14 free pages
Interplay between VM and FM
A memory page contains n flash pagesAt a page fault, n flash pages of the victim virtual page are written back to back to flash memory
Two important observationsUnnecessary writes from replacing a virtual page Intra-page locality
Flash MemoryMemory Page
1
3
nn Writes
Flash Page Size Swap_out()2
Unnecessary Writes from Replacing a Virtual Page
In conventional virtual memory system, a full victim page is written to the secondary storage.
Flash Memory
Memory Page
Clean Data
Dirty Data
Clean Data
Clean Data
Four Writes
Flash Page Size
Unnecessary Writes from Replacing a Virtual Page
In conventional virtual memory system, a full victim page is written to the secondary storage.
Flash Memory
Memory Page
Clean Data
Dirty Data
Clean Data
Clean Data
Flash Page Size
Four Writes
Dirty Ratio
A victim page often contains a significant amount of unmodified data.
97.62%59.31%72.40%40.90%69.41%Dirty Ratio
gqview+juk
openoffice+juk
kspread+juk
mozilla+juk
kword+juk
Application98.86%66.59%88.61%48.49%89.73%Dirty Ratio
gqviewopenofficekspreadmozillakwordApplication
Dirty ratio =Dirty ratio = the number of dirty 512B block in a dirty memory pagethe number of 512B blocks in a main memory page
DDDDDDDD
CCCCCCCC
BBBBBBBB
What is Intra-page Locality?
Flash pages in one main memory page are written to flash memory back to back
A0 A1 A2 A3
A4 A5 A6 A7
B0 B1 B2 B3
B4 B5 B6 B7
C0 C1 C2 C3
C4 C5 C6 C7
D0 D1 D2 D3
D4 D5 D6 D7
Block X Block YA0
A1
A2
A3
A4
A5
A6
A7
Virtual page
Main memory
A0, A1, A2, A3, A4, A5, A6, A7
Swap_out(A)
Why is Preserving Intra-Page Locality Important?
After page A, B are swapped out
A A A AA A A A
A A A A
A A A A
B B B BB B B B
B B B B
B B B B
C C C CC C C C
C C C CC C C C
C C C CC C C C
C C C CC C C C
D D D DD D D D
D D D DD D D D
D D D DD D D D
D D D DD D D D
Block X Block Y Block X Block Y
Block X Block Y Block X Block Y
It affects the efficiency of garbage collection
After page A, B are swapped out
Garbage Collection Threshold vs. Intra-Page Locality
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
1.55
255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287
Garbage collection threshold
Nor
mal
ized
Ene
rgy
Con
sum
ptio
n
GCt : garbage collection thresholdm: # of flash pages in one memory pagen: # of flash pages in one flash block• GCt mod m = 0• GCt mod n ≥ n − m• GCt mod m ≠ 0 and GCt mod n < n − m
Proposed Energy Efficient VM Design
Reduce # of writes to flash memorySubpagingHotCache
Efficient garbage collectionDuplication-aware garbage collection
Subpaging
Divide a virtual memory page into a set of subpages in the granularity of flash page sizeEach subpage is associated with a dirty bit.
Memory Page
Clean Data
Dirty Data
Clean Data
Clean Data
One Write to Flash
Flash Page Size 0
1
0
0Dirty Bit
Flash Memory
HotCache
HotCacheManagement Policy
Caching writes onlyPreserving intra-page localityCapturing hot data
HotCache
Flash MemoryFlash Memory
Physicaldevice
FTL layer
Command translationMTD layer
(f, 0, 0, 3)1 (f, 0, 1, 2)2 (s, -, 2, -)
LBA Physical address(f/s, bank, block, page)
……
address translation table
0
HotCacheManagement
GarbageCollection
How to Capture Hot Data?
Three management policies Two-level LRU (2L)Time frequency (TF)
Replace the HotCache block with smallesttimestamp * write_counts
Time frequency locality (TFL)TF policy with intra-page locality preserved
………tailhead
1st level list
………
tailhead2nd level list
Two-level LRU
DDDD
1DDDD
1
Duplication-Aware Garbage Collection
Exploit data redundancy between the main memory and flash memory to eliminate unnecessary live page copying during garbage collection
An invalided pageAn invalided page A free pageA free page
Duplication-Aware Garbage
Collection
0 BBBB
AAAA
Main Memory
0
Dirty Bit
Flash Memory Flash MemoryA A A AB B B BC C C C
C C C C
DDDD
1DDDD
11 B
BBB
AAAA
Main Memory
1
Dirty Bit
Duplication-Aware Garbage Collection (cont’d.)
SWAP system
GarbageCollection
FTL
Read(LBA) Write(LBA,PID,VPN)
(0, 0, 1)(0, 0, 2)(0, 0, 3)
LBAPhysical address(bank, block, page)
………
Block Allocation Map (BAM)
InvalidValid
FreeState
98
0PID
87
311
0VPN
10
1In_memory
Swap_clean(LBA) Swap_free(PID,VPN)
(0, 0, 1)(0, 1, 2)(1, 2, 5)
LBA Physical address(bank, block, page)
……
Address Translation Table (ATT)
12
0
Experimental SetupTrace-driven simulation
Valgrind: captures the memory access trace while application executing in real-time.
Applicationskword, kspread, mozilla, openoffice, gqviewMulti-programming workloads: kword+juk, kspread+juk, mozilla+juk, openoffice+juk, gqview+juk
Configuration Main memory
16MB, 4K page size.Flash memory
16K block, 512B page128K block, 2KB page. MP3 jukebox
programjuk
image viewergqview
popular office suite similar to Microsoft office
openoffice
web browsermozilla
spreadsheet application
kspread
word processorkword
DescriptionApplication
Subpaging
Smaller flash page size leads to more energy reduction 512B page size: 20% energy reduction on average2KB page size: 8% energy reduction on average
Save more energy for multiprogramming workload Single-program workload: openoffice (14% energy reduction)Multi-program workload: openoffice+juk (31% energy reduction)
00.10.20.30.40.50.60.70.80.9
1
kword mozilla kspread openoffice gqview kword+juk
mozilla+juk
kspread+juk
openoffice+juk
gqview+juk
Nor
mal
ized
Ene
rgy
Con
sum
ptio
n
Flash page size = 2KB Flash page size = 512B
HotCache:Hit Rates & Energy Savings
9.98%10%8.62%3.91%3.34%Average hit rate
TFL TF 2L LRU
1MBCache size
FIFO Replacement Policy
5%5.02%4.72%0.49%0.41%Average hit rate
TFL TF 2L LRU FIFO Replacement Policy
512KBCache size
00.10.20.30.40.50.60.70.80.9
1
2L TFTFL 2L TFTFL 2L TFTFL 2L TFTFL 2L TFTFL 2L TFTFL 2L TFTFL 2L TFTFL 2L TFTFL 2L TFTFL
kword mozilla kspread openoffice gqview kword+juk mozilla +juk kspread+juk
openoffice+juk
gqview+juk
Nor
mal
ized
Ene
rgy
Con
sum
ptio
n
SRAM read write energy garbage collection
HotCache: Energy Breakdown
TF causes higher overhead per GC due tobreaking intra-page locality
Duplication-Aware Garbage Collection
Up to 50% of energy reductionAverage energy reduction rate is 24%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
kword mozilla kspread openoffice gqview kword +juk mozilla+juk
kspread+juk
openoffice+juk
gqview+juk
average
Norm
aliz
ed E
nerg
y Co
nsum
ptio
n
HotCache + Subpaging + DA-GC
Energy reduction of HotCache + Subpaging + DA-GCRanging from 9.3% to 75%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
kword mozilla kspread openoffice gqview kword +juk mozilla+juk
kspread+juk
openoffice+juk
gqview+juk
Nor
mal
ized
Ene
rgy
Con
sum
ptio
n
1MB HotCache & 512KB flash pages
Conclusion
We revisit virtual memory system design with flash memory as the secondary storage Three energy-efficient VM design
SubpagingHotCache managementDuplication-aware garbage collection
Joint use of Subpaging & TFL policy & DA-GCReduce up to 75% of flash memory energy
Flash Memory ControllerCPU core SRAM
HostInterface
FlashInterface
Flash memory bus
Flash memory chips
Host Interface
On-Going Works
SSD in server platformHigh-throughput multi-bank flash system Data placement in SLC/MLCReliability issue
ATA
or S
ATA
micro controller
SRAMMLC
MLC
MLC
MLC
MLC
MLC
MLCMLC
SLC