Sougata Bhattacharjee
Caching for Flash-Based Databases
Summer Semester 2013
• MOTIVATION• FLASH MEMORY
FLASH CHARACTERISTICS FLASH SSD ARCHITECTURE FLASH TRANSLATION LAYER
• PAGE REPLACEMENT ALGORITHM ADAPTIVE REPLACEMENT POLICY
• FLASH-AWARE ALGORITHMS CLEAN-FIRST LRU ALGORITHM CLEAN-FIRST DIRTY-CLUSTERED (CFDC) ALGORITHM AD-LRU ALGORITHM CASA ALGORITHM
• CONCLUSION• REFERENCES
OUTLINE
Data ExplosionThe worldwide data volume is growing at an astonishing
speed.In 2007, we had 281 EB data; in 2011, we had 1800 EB data.
Motivation
Flash Memory
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
Data storage technology : HDDs and DRAMHDDs suffer from HIGH LATENCY.DRAM comes with HIGHER PRICE.
Energy consumptionIn 2005, total power used by servers in USA was 0.6% of its total annual electricity consumption.
We need to find a memory technology which may overcome these
limitations.
http://faculty.cse.tamu.edu/ajiang/Server.pdf
BACKGROUND
In 1980, Dr. Fujio Masuoka invented Flash memory.
In 1988, Intel Corporation introduced Flash chips.
In 1995, M-Systems introduced flash-based solid-state drives.
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
What is flash? Flash memory is an electronic non-volatile semiconductor storage device that can be electrically erased and reprogrammed.
3 Operations: program (Write), Erase, and Read.
Two major forms NAND flash and NOR flash
NAND is newer and much more popular.
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
FLASH AND MEMORY HIERARCHY
Registers
CACHE
RAM
HDD
HigherSpeed, Cost
Larger Size
Flash is faster, has lower latency, is more reliable,
but more expensive
than hard disks
NAND Flash
READ - 50 μsec
WRITE – 200 μsec
ERASE- Very Slow
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
Why Flash is popular?
Benefits over magnetic hard drives
Offers lower access latencies.
Semi-conductor technology, no mechanical parts.
High data transfer rate.
Higher reliability (no moving parts).
Lower power consumption.
Small in size and light in weight. Longer life span.
Benefits over RAM
Lower power consumption.
Lower price.
Flash SSD is widening its range of applications
Embedded devices
Desktop PCs and Laptops
Servers and Supercomputers
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
USE OF FLASH
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2011/20110811_S308_Cooke.pdf , Page 2
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
FLASH OPERATIONS
……
Block 1 Block 2 Block n
• Three operations: Read, Write, Erase.
• Reads and writes are done at the granularity of a page (2KB or 4KB)• A flash block is much larger than a disk block: Contains p (typically 32 - 128) fixed-size flash pages with 512 B - 2 KB
Page
Page
Data
……
Page
Page
Page
Data
……
Page
• Erasures are done at the granularity of a block (10,000 – 100,000 erasures)• Block erase is the slowest operation requiring about 2ms• Update of flash pages not possible; only overwrite of an entire block where erasure is needed first
……Page
Page
Data
……
Page
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
FLASH OPERATIONS
Block 1 Block 2
Page
Page
Page
• Update of flash pages not possible; only overwrite of an entire block where erasure is needed first
Page
Page
DataPage
Page Page
Free
Page
Page
Page
Page
Page
DataPage
PagePage
Block 1 Block 2
Page
Page
FreeSteps of
modified DB Pages
PagePage
Page
Page
Page
Page
Free
ERASE
Ful
l
Blo
ck
Updates go to new page (new block).
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
FLASH CONSTRAINTS
Write/Erase granularity asymmetry
(Cons1)
Erase-before-write rule (Cons2)
Limited cell lifetime (Cons3)
+
Invalidate Out-of-place update
Logical to Physical Mapping
Garbage Collection
+
Wear Leveling
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
FLASH MEMORY STRUCTURE
FTL
Mapping
Garbage Collecti
onWear
Leveling
Other
File System
Flash Device
Various operations need to be carried out to ensure correct operation of a flash device.
Garbage collection Wear leveling
Mapping
Flash Translation Layer controls flash management.
Hides complexities of device management from the application Garbage collection and wear leveling
Enable mobility – flash becomes plug and play
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
MAPPING TECHNIQUES (1/2)
3 Types of Basic Mappings
Page-Level Mapping Block-Level Mapping Hybrid Mapping
0 8
1 4
2 3
3 11
4 9
5 3
6 7
7 0
8 1
9 2
10 6
11 5
@7
LPN PPN
Page-Level Mapping
Each page mapped independently.
Highest performance potential.
Highest resource use
Large size of mapping table.
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
MAPPING TECHNIQUES (2/2)
3 Types of Basic Mapping
Page-Level Mapping Block-Level Mapping Hybrid Mapping
0 3
1 0
2 1
3 2
@7 = 7/4 =1
LBN PBN
Block-Level Mapping
Only block numbers kept in the mapping table.
Page offsets remain unchanged.
Small mapping table.
Bad performance for write updates.
7 mod 4 =3
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
FTL BLOCK-LEVEL MAPPING (BEST CASE)k flash blocks: B g log blocks: Lfree blocks: F
….. ….. …..
1 2 k-1 k i j 1 g
…..
Erase B1Switch: L1 becomes B1, Erase old B1
1 Erase operation
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
FTL BLOCK-LEVEL MAPPING (SOME CASE)k flash blocks: B g log blocks: Lfree blocks: F
….. ….. …..
1 2 k-1 k i j 1 g
…..
Erase B1
2 Erase operation
Merge: B1 and L1 to Fi Erase L1
Merge of n flash blocks and one log block to Fi n+1 erasures
Flash Memory
Motivation
Page Replacement
Algorithm
Flash-Aware Algorithms
Conclusion
GARBAGE COLLECTION
Moves valid pages from blocks containing invalid data and then erases the blocks Removes invalid pages and increases free pages
FREE
Valid
Invalid
ER
AS
E F
ull
B
lock
ER
AS
E F
ull
B
lock
Wear Leveling decides where to write the new data Wear Leveling picks up most frequently erased blocks and least worn-out blocks to equalize overall usage and swap their content; enhances lifespan of flash
Find the location of the desired page on disk.
Find free frame:
If a free frame exists, use it.
Otherwise, use a page replacement algorithm to select a victim page.
Page Replacement
Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
BASICS OF PAGE REPLACEMENT
Load the desired page into the frame.
Update the page allocation table (page mapping in the buffer).
Upon the next page replacement, repeat the whole process again in the same way.
Cache is FAST but EXPENSIVE HDDs are SLOW but CHEAP
Page Replacement
Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
THE REPLACEMENT CACHE PROBLEM
How to manage the cache?Which page to replace?
How to maximize the hit ratio?
How LRU Works?
Page Replacement
Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
PAGE REPLACEMENT ALGORITHMS (1/2)
Least Recently Used (LRU) - Removes the least recently used items first- Constant time and space complexity & simple-to-implement- Expensive to maintain statistically significant usage statistics- Does not exploit "frequency”- It is not scan-resistant
A
B
C
D
String: C A B D E F D G E Time: 0 1 2 3 4 5 6 7 8 9
C
A
B
D
A
C
B
D
D
F
E
B
D
B
A
C
B
A
C
D
E
D
B
A
F
E
D
B
G
D
F
E
E
G
D
F
Page FaultC goes out
Page FaultA goes out
Page FaultB goes out
Page Replacement
Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
PAGE REPLACEMENT ALGORITHMS (2/2)
Least Frequently Used (LFU)- Removes least frequently used items first- Is scan-resistant- Logarithmic time complexity (per request)- Stale pages can remain a long time in the buffer
LRU + LFU = LRFU (Least Recently/Frequently Used)- Exploit both recency and frequency- Better performance than LRU and LFU- Logarithmic time complexity - Space and time overhead
Adaptive Replacement Cache(ARC) is a Solution
Page Replacement
Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
ARC (ADAPTIVE REPLACEMENT CACHE) CONCEPT
L1
General double cache structure (cache size is 2C)
MRU
LRU
L1 contains recently seen pages: Recency list.
L2
MRU
LRU
L2 contains pages seen at least twice recently: Frequency list. If L1 contains exactly C pages, replace the LRU page from L1. Otherwise , replace the LRU page in L2.
Cache is partitioned into two lists L1 and L2.
Page Replacement
Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
ARC CONCEPT
ARC structure (cache size is C)
MRU
LRU
Divide L2 into T2 (MRU end) & B2 (LRU end).
MRU
LRU
Upon a page request: if it is found in T1 or T2 , move it to MRU of T2.
When cache miss occurred, new page is added in MRU of T1
- If T1 is full, LRU of T1 is moved to MRU of B1.
Divide L1 into T1 (MRU end) & B1 (LRU end).
T2T1
L1 L2
The size of T1 and T2 is C.
The size of T1 and B1 is C, same for T2 and B2.
2C
B1 B2
C
Page Replacement
Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
ARC PAGE EVICTION RULE
ARC structure (cache size is C)
MRU
LRU
If requested page found in B1, P is increased & the page moved to MRU position of T2.
MRU
LRU
ARC adapts parameter P , according to an observed workload. - P determines the target size of T1.
T2T1
L1 L2
If requested page found in B2, P is decreased & the page moved to MRU position of T2.
2C
B1 B2
C
Page Replacement
Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
HOW ARC WORKS? (1/2)
B1 T1 T2 B2
Recency
FrequencyC
C C
Reference String : A B C A D E E F G D
A
A B
A B C
B C A
B C D A
B C D E A
B C D E A
B C D F E A
B C D F G E A
B C F G D E A
0 1
2
3
4
5
6
7
8
9
Time
Page Replacement Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
HOW ARC WORKS? (2/2)
Reference String :
A B C A D E E F G D
H I J G K H L D
B C F G H D E A
C F G H I J D E A
C F H I J G D E A
C F H I J K G D E A
9 10
11
12
13
14
Time B C F G D E A
B C F G H I D E A
Increase T1, Decrease B1
Page B is out from the list
Scan-
Resis
tant
Self-Tuning
C F I J K H G D E A15 C F I J K L H G D E A
C F I J K L D H G E A16 17 Increase T2, Decrease
B2Self-Tuning
Page Replacement
Algorithm
Motivation
Flash Memory
Flash-Aware Algorithms
Conclusion
ARC ADVANTAGE
ARC is scan-resistant.
ARC is self-tuning and empirically universal.
Stale pages do not remain in memory; better than LFU. ARC consumes about 10% - 15% more time than LRU, but the hit ratio is almost twice as for LRU. Low space overhead for ‘B’ lists.
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
FLASH-AWARE BUFFER TECHNIQUES
Minimize the number of physical write operations.
Cost of page write is much higher than page read.
Buffer manager decides How and When to write.
CFLRU (Clean-First LRU) LRUWSR (LRU Write Sequence Reordering) CCFLRU (Cold-Clean-First LRU) AD-LRU (Adaptive Double LRU)
Read/Write entire flash blocks (addressing the FRW problem) FAB (Flash Aware Buffer)
REF (Recently-Evicted-First)
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CLEAN-FIRST LRU ALGORITHM (1/3)
One of the earliest proposals of flash-aware buffer techniques. CFLRU is based on LRU replacement policy.
LRU list is divided into two regions:
Working region: Recently accessed pages.
Clean-first region: Pages for eviction.
P1 P2 P3 P4 P5 P6 P7 P8Working Region
Clean-First Region
Window , W = 4
LRU
MRU
CFLRU always selects clean pages to evict from the clean- first region first to save flash write costs. If there is no clean page in this region, a dirty page at the end of the LRU list is evicted.
CleanDirty
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CLEAN-FIRST LRU ALGORITHM (2/3)
P1 P2 P3 P4 P5 P6 P7 P8Working Region
Clean-First Region LR
U
MRU
CFLRU always selects clean pages to evict from the clean- first region first to save flash write costs. If there is no clean page in this region, a dirty page at the end of the LRU list is evicted.
CleanDirtyEvicted Pages :
P7
P5
P8
P6
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CLEAN-FIRST LRU ALGORITHM (3/3)
Disadvantage :
CFLRU has to search in a long list in case of a buffer fault.
Keeping dirty pages in the clean-first region can shorten the memory resources.
Determine the size of W, the window size of the clean-first region.
CFDC : Clean-First, Dirty-Clustered
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CFDC (CLEAN-FIRST, DIRTY-CLUSTERED) ALGORITHM
54 1 45 33 44
39 69 48 7 11
20 6 4 13 8 15 27 28 29
Clean Queue
Dirty Queue
Victim
Working RegionPriority Region
Divide clean-first region (CFLRU) into two queue: Clean Queue and Dirty Queue; Separation of Clean and Dirty Pages. Dirty pages are grouped in clusters according to spatial locality. Clusters are ordered by priority.
Implement two-region scheme. Buffer divided into two region:
1. Working Region : Keep hot pages2. Priority Region: Assign priority to pages
Otherwise, a dirty page is evicted from the LRU end of a cluster having lowest priority.
Clean pages are always chosen first as victim pages.
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CFDC ALGORITHM – PRIORITY FUNCTION
For a cluster c with n pages, its priority P(c) is computedaccording to Formula 1
Where P0, …, Pn-1 are the page numbers ordered by theirtime of entering the cluster.
IPD (Inter-Page Distance)
Example : 15
8
13
20 4
6
29
28
27
4 2 3 6
Victim Page
Timestamp ->Priority -> 2/9 1/8 1/14 1/18 Lowest Priority
GlobalTime : 10
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CFDC ALGORITHM – EXPERIMENTS
CFDC vs. CFLRU: 41% CFLRU vs. LRU: 6%
Cost of page flushes Clustered writes are efficient
Number of page flushes CFDC has close write count to CFLRU
Influence of increasing update ratios CFDC is equal with LRU for update workload.
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CFDC ALGORITHM – CONCLUSION
Reduces the number of physical writes
Improves the efficiency of page flushing
Keeps high hit ratio.
Size of the Priority Window is a concern for CFDC.
CASA : Dynamically adjusts buffer size
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
AD-LRU (ADAPTIVE DOUBLE LRU) ALGORITHM
AD-LRU integrates the properties of recency, frequency and cleanness into the buffer replacement policy.
Cold LRU Hot LRU
LRU
LRU
MRU
MRU
Cold LRU: Keeps pages referenced once Hot LRU: Keeps pages referenced at least twice (frequency)
FC
FC
Min_lc
FC (First-Clean) indicates the victim page.
If page miss occurs, increase the size of the cold queue If buffer is full, cold clean pages are evicted from Cold LRU. If cold clean pages are not found, then cold dirty pages are evicted by using a second-chance algorithm.
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
AD-LRU ALGORITHM EVICTION POLICY
Example :Buffer size : 9 pages
3Dirty Hot
7Dirty Hot
2CleanHot
1Dirty Hot
4Dirty Cold
6Clean Cold
5Clean Cold
9Dirty Cold
8Dirty Cold
Hot Queue
Cold Queue
MRU LRU
Ad-LRU Victim
6Clean Cold
Ad-LRU Victim
4Dirty Cold
If no clean cold page is found, then a dirty cold page will be chosen as victim using a second-chance algorithm.
10Dirty Cold
New Page
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
AD-LRU ALGORITHM EXPERIMENTS
Write count vs. buffer size for various workload patterns
AD-LRU has the lowest write count
Random Read-Most
Write-Most Zipf
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
AD-LRU ALGORITHM - CONCLUSION
AD-LRU considers reference frequency, an important property of reference patterns, which is more or less ignored by CFLRU. AD-LRU frees the buffer from the cold pages as soon as appropriate.
AD-LRU is self-tuning.
AD-LRU is scan-resistant.
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CASA (COST-AWARE SELF-ADAPTIVE) ALGORITHM
CASA makes trade-off between physical reads and physical writes. It adapts automatically to varying workloads.
LRU
LRU
MRU
MRU
b= |Lc|+ |Ld|
|Lc| |Ld|
Clean List Lc Dirty List Ld
Divide buffer pool into 2 dynamic lists: Clean and Dirty list Both lists are ordered by reference recency.
CASA continuously adjust parameter τ; 0 ≤ τ ≤ b
In case of a buffer fault: τdecides from which list the victim page will be chosen.
τ is the dynamic target size of Lc, so size of Ld is (b – τ).
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CASA ALGORITHM
CASA algorithm considers both read and write cost.
LRU
LRU
MRU
MRU
b= |Lc|+ |Ld|
|Lc| |Ld|
Clean List Lc Dirty List Ld
Case 1: Logical Read request in Lc , τ increased.
Case 2: Logical Write request in Ld , τ decreased.
CASA algorithm also considers the status (R/W) of a requested page
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CASA ALGORITHM – EXAMPLE (1/2)
Total buffer size b = 13, τ = 6, Target Size of Lc = 6, Ld = 7
24 13 19 16 21 33
LRU LRU
b= |Lc|+ |Ld|
Clean List LcDirty List Ld
Case 1: Logical Read request in Lc , τ increased.
Incoming page : 14 (Read) in Lc
24 13 19 16 21 33 14
Total buffer size b = 13, τ = 7 , target size of Lc = 7, Ld = 6
11 22 34 4 5 7 811 22 34 4 5 7
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CASA ALGORITHM – EXAMPLE (2/2)
LRU LRU
b= |Lc|+ |Ld|
Clean List LcDirty List Ld
Case 1: Logical Read request in Lc , τ increased.
Case 2: Logical Write request in Ld , τ decreased.
Incoming page : 15 (Write) in Ld
24 13 19 16 21 33 14 11 22 34 4 5 7
Total buffer size b = 13, τ = 7 ,Target Size of Lc = 7, Ld = 6
15 11 22 34 4 5 713 19 16 21 33 14
Total buffer size b = 13, τ = 6 , target size of Lc = 6, Ld = 7
Flash-Aware Algorithms
Motivation
Flash Memory
Page Replacement
Algorithm
Conclusion
CASA ALGORITHM - CONCLUSION
CASA is implemented for two-tier storage systems based on homogeneous storage devices with asymmetric R/W costs. CASA can detect cost ratio dynamically.
CASA is self-tuning. It adapts itself to varying cost ratios and workloads
Motivation
Flash Memory
Page Replacement
Algorithm
Flash-Aware Algorithms
CONCLUSION
However, the performance behavior of flash devices is still remaining unpredictable due to complexity of FTL implementation and its proprietary nature.
Conclusion
Flash memory is a widely used, reliable, and flexible non-volatile memory to store software code and data in a microcontroller.
To gain more efficient performance, we need to implement a flash device simulator. We addressed issues of buffer management for two-tier storage systems (Caching for a flash DB); ARC and CASA are two better approach. Phase-change memory (PCM) is a promising next-generation memory technology, which can be used for database storage systems.
REFERENCES
1. Yi Ou: Caching for flash-based databases and flash-based caching for databases, Ph.D. Thesis, University of Kaiserslautern, Verlag Dr. Hut, Online August 2012
2. Nimrod Megiddo, Dharmendra S. Modha: ARC: A Self-Tuning, Low Overhead Replacement Cache. FAST 2003: (115-130)
3. Nimrod Megiddo, Dharmendra S. Modha: Outperforming LRU with an Adaptive Replacement Cache Algorithm. IEEE Computer 37(4): 58-65 (2004)
4. Yi Ou, Theo Härder: Clean first or dirty first?: a cost-aware self-adaptive buffer replacement policy. IDEAS 2010: 7-14
5. Seon-Yeong Park, Dawoon Jung, Jeong-Uk Kang, Jinsoo Kim, Joonwon Lee: CFLRU: a replacement algorithm for flash memory. CASES 2006: 234-241
6. Yi Ou, Theo Härder, Peiquan Jin: CFDC: a flash-aware replacement policy for database buffer management. DaMoN 2009: 15-20
7. Peiquan Jin, Yi Ou, Theo Härder, Zhi Li: AD-LRU: An efficient buffer replacement algorithm for flash-based databases. Data Knowl. Eng. 72: 83-102 (2012)
8. Suman Nath, Aman Kansal: FlashDB: dynamic self-tuning database for NAND flash. IPSN 2007: 410-419
9. Kyoungmoon Sun, Seungjae Baek, Jongmoo Choi, Donghee Lee, Sam H. Noh, Sang Lyul Min: LTFTL: lightweight time-shift flash translation layer for flash memory based embedded storage. EMSOFT 2008: 51-58
10. Nimrod Megiddo, Dharmendra S. Modha: System and method for implementing an adaptive replacement cache policy, US 6996676 B2, 2006
11. Wikipedia: Flash memory12. Wikipedia: Page replacement algorithm13. N. Megiddo , D. S. Modha: Adaptive Replacement Cache, IBM Almaden Research Center,
April 200314. Yang Hu, Hong Jiang, Dan Feng, Lei Tian, Shu Ping Zhang, Jingning Liu, Wei Tong, Yi
Qin, Liuzheng Wang: Achieving page-mapping FTL performance at block-mapping FTL cost by hiding address translation. MSST 2010: 1-12
THANK YOU