SYSTOR2010, Haifa Is-rael
1
Optimization of LFS with Slack Space Recycling and Lazy Indirect Block Update
Yongseok Oh <[email protected]>The 3rd Annual Haifa Experimental Systems Conference
May 24, 2010
Haifa, Israel
Yongseok Oh and Donghee Lee (University of Seoul) Jongmoo Choi (Dankook University)
Eunsam Kim and Sam H. Noh (Hongik University)
SYSTOR2010, Haifa Is-rael
2
IntroductionSlack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFSPerformance EvaluationConclusions
Outline
SYSTOR2010, Haifa Is-rael
3
LFS collects small write requests and writes them sequentially to the storage device [Rosenblum at el, ACM TOCS ‘91]
Advantages ■ Superior write performance for random workloads■ Fast recovery
Drawbacks■ On-demand cleaning■ Cascading meta-data update
Log-structured File System
Storage
A B C D
Segment buffer
SYSTOR2010, Haifa Is-rael
4
Hard Disk Drives ■ Mechanical components – disk head, spindle motor, and platter■ Poor random read/write performance
Solid State Drives■ Consist of NAND flash memory, no mechanical parts■ High performance, low-power consumption, and shock resistance■ Sequential writes are faster than random writes
Storage Devices
SYSTOR2010, Haifa Is-rael
5
IntroductionSlack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFSPerformance EvaluationConclusions
Outline
SYSTOR2010, Haifa Is-rael
6
To make free segments■ LFS cleaner copies valid blocks to other free segment
On-demand cleaning■ Overall performance decreases
Background cleaning ■ It does not affect the performance
LFS cleaning
A B C D A B C D
Copy
Segment 1 Segment 2 Segment 3
SYSTOR2010, Haifa Is-rael
7
Matthews et al. employed Hole-plugging in LFS [Matthews et al, ACM OSR ’97]
The cleaner copies valid blocks to holes of other segments
Hole-plugging
A B C D
Copy
Segment 1 Segment 2 Segment 3
A B
SYSTOR2010, Haifa Is-rael
8
We proposed SSR scheme that directly recycles slack space to avoid on-demand cleaning
Slack Space is invalid area in used segment
Slack Space Recycling (SSR) Scheme
SSRSSR
A B C D
Segment 1 Segment 2 Segment 3
Segment buffer
E F G H
E F G H
SYSTOR2010, Haifa Is-rael
9
Update of a data block incurs cascading meta-data update Modification of a data block consumes 4 blocks
Cascading meta-data update
Indirect
Data
A
A
B
Double Indirect
B
C
i-node
C1
N-1N
D
A B C D E F A’
Segment 1 Segment 2
Data’
A’
Update
C’1
N-1N
D’
Update
A’
B’
update
B’
C’
update
D’B’ C’
SYSTOR2010, Haifa Is-rael
10
We propose LIBU scheme to decrease cascading meta-data update■ LIBU uses IBC (Indirect Block Cache) to absorb the frequent update of indirect
blocks ■ Indirect Map is necessary to terminate cascading meta data update
Lazy Indirect Block Update (LIBU) scheme
Indirect
Data
A
A
B
Double Indirect
B
C
i-node
21
N-1N
D
IBC (Indirect Block Cache)
E
Indirect Map
C
A B C D E F A’ D’
Segment 1 Segment 2
Data’
A’
Update
21
N-1N
D’
Update
A’
B’
Insert
B’
C’
Insert
C’
Insert
E’
SYSTOR2010, Haifa Is-rael
11
For crash recovery, LFS periodically stores checkpoint information If power failure occurs,
■ search the last check point■ scan all segments written after the last check point■ rebuild i-node map, segment usage table, indirect map, and indirect blocks in IBC
Crash Recovery
Power failure
search the last checkpoint
Scan
Check point Check point(last)
i-node map Segment usage table
indirect map
indirect blocks
flush
Write WriteRebuildConsistent state
Disk
RAM
SYSTOR2010, Haifa Is-rael
12
IntroductionSlack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFSPerformance EvaluationConclusions
Outline
SYSTOR2010, Haifa Is-rael
13
We implemented SSR-LFS (Slack Space Recycling LFS)■ Using FUSE (Filesystem in Userspace) framework in Linux
SSR-LFS selectively chooses either SSR or cleaning ■ When the system is idle, it performs background cleaning■ When the system is busy, it performs SSR or on-dmenad cleaning
If average slack size is too small, it selects on-demand cleaning Otherwise, it selects SSR
Implementation of SSR-LFS
VFS FUSE
write(“/mnt/file”)
SSR-LFS Core
Userspace
Kernel
Syncer
IBC
Cleaner
buffer cache
Recycler
i-node cache
libfuse
Architecture of SSR-LFS
SYSTOR2010, Haifa Is-rael
14
IntroductionSlack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFSPerformance EvaluationConclusions
Outline
SYSTOR2010, Haifa Is-rael
15
For comparison, we used several file systems■ Ext3-FUSE■ Org-LFS(cleaning)■ Org-LFS(plugging)■ SSR-LFS
Benchmarks■ IO TEST that generates random write workload■ Postmark that simulates the workload of a mail server
Storage Devices ■ SSD – INTEL SSDSA2SH032G1GN■ HDD – SEAGATE ST3160815AS
Experimental Environment
SYSTOR2010, Haifa Is-rael
16
SSR-LFS shows better than others for a wide range of utilization On HDD
■ SSR-LFS and Org-LFS outperform Ext3-FUSE under utilization of 85%,
On SSD■ Ext3-FUSE outperforms Org-LFS due to optimization of SSD for random writes
Random Update Performance
200
400
600
800
1000
1200
1400
1600
10 20 30 40 50 60 70 80 90
Exe
cutio
n Ti
me(
sec)
Utilization(%)
Ext3-FUSEOrg-LFS(Plugging)Org-LFS(Cleaning)
SSR-LFS
0 50
100 150 200 250 300 350 400 450 500
10 20 30 40 50 60 70 80 90
Exe
cutio
n Ti
me(
sec)
Utilization(%)
Ext3-FUSEOrg-LFS(Plugging)Org-LFS(Cleaning)
SSR-LFS
HDD SSD
SYSTOR2010, Haifa Is-rael
17
Medium file size (16KB ~ 256KB)■ Subdirectories 1000, # of files 100,000, # of transactions 100,000
SSR-LFS outperforms other file systems on both devices Org-LFS shows better performance than Ext3-FUSE on HDD Ext3-FUSE shows comparative performance to Org-LFS on SSD
Postmark Benchmark Result (1)
Org-LFS SSR-LFS Ext3-FUSE0
100
200
300
400
500
600
700
SSD
Exec
utio
n tim
e
Org-LFS SSR-LFS Ext3-FUSE0
1000
2000
3000
4000
5000
6000
HDD
Exec
utio
n tim
e
SYSTOR2010, Haifa Is-rael
18
Small file size (4KB ~ 16KB)■ Subdirectories 1000, # of files 500,000 ,# of transactions 200,000
Ext3-FUSE performs better than other file systems on SSD■ due to meta-data optimization of Ext3 such as hash-based directory
Postmark Benchmark Result (2)
Org-LFS SSR-LFS Ext3-FUSE0
200
400
600
800
1000
1200
SSD
Exec
utio
n tim
e
Org-LFS SSR-LFS Ext3-FUSE0
1000
2000
3000
4000
5000
6000
7000
HDD
Exec
utio
n tim
e
SYSTOR2010, Haifa Is-rael
19
IntroductionSlack Space Recycling and Lazy Indirect Block Update Implementation of SSR-LFSPerformance EvaluationConclusions
Outline
SYSTOR2010, Haifa Is-rael
SSR-LFS outperforms original style LFS for a wide range of utilization
Future works■ Optimization of meta-data structures ■ Cost-based selection between cleaning and SSR
We plan to release the source code of SSR-LFS this year■ http://embedded.uos.ac.kr
20
Conclusions
SYSTOR2010, Haifa Is-rael
21
Thank you
Q & A
SYSTOR2010, Haifa Is-rael
22
Back up slides
SYSTOR2010, Haifa Is-rael
23
Storage devices
4K 16K
64K
256K 1M 4M 16
M0
50100150200250
Intel SSD (Write)
SeqRand
Request Size
Thro
ughp
ut(M
B/s)
4K 16K
64K
256K 1M 4M 16
M0
50100150200250
Intel SSD (Read)
SeqRand
Request Size
Thro
ughp
ut(M
B/s)
4K 16K
64K
256K 1M 4M 16
M0
50100150200250
Seagate HDD (Write)
SeqRand
Request Size
Thro
ughp
ut(M
B/s)
4K 16K
64K
256K 1M 4M 16
M0
50100150200250
Seagate HDD (Read)
SeqRand
Request Size
Thro
ughp
ut(M
B/s)
SYSTOR2010, Haifa Is-rael
24
To identify the performance penalty of a user space implementation Ext3-FUSE underperforms Kernel Ext3 for almost patterns
■ Due to FUSE overhead
Measurement of FUSE overhead
Seq-write rand-write seq-read rand-read0
10
20
30
40
50
60
70
80
HDD
Ext3Ext3-FUSE
Thro
ughp
ut(M
B/s)
Seq-write rand-write seq-read rand-read0
50
100
150
200
250
SSD
Ext3Ext3-FUSE
Thro
ughp
ut(M
B/s)
SYSTOR2010, Haifa Is-rael
25
Medium file size (16KB ~ 256KB)■ Subdirectories 1000, # of files 100,000, # of transactions 100,000
SSR-LFS outperforms other file systems on both devices Org-LFS shows better performance then Ext3-FUSE on HDD Ext3-FUSE shows comparative performance to Org-LFS on SSD
Postmark Benchmark Result (1)
Org-LFS SSR-LFS Ext3-FUSE0
100
200
300
400
500
600
700
SSD
Exec
utio
n tim
e
Org-LFS SSR-LFS Ext3-FUSE0
1000
2000
3000
4000
5000
6000
HDD
Exec
utio
n tim
e
1302 cleanings 4142 SSRs
SYSTOR2010, Haifa Is-rael
26
Small file size (4KB ~ 16KB)■ Subdirectories 1000, # of files 500,000 ,# of transactions 200,000
Ext3-FUSE performs better than other file systems on SSD■ due to meta-data optimization of Ext3 such as hash-based directory
Postmark Benchmark Result (2)
Org-LFS SSR-LFS Ext3-FUSE0
200
400
600
800
1000
1200
SSD
Exec
utio
n tim
e
Org-LFS SSR-LFS Ext3-FUSE0
1000
2000
3000
4000
5000
6000
7000
HDD
Exec
utio
n tim
e
3018 cleanings 1692 cleanings1451 SSRs
SYSTOR2010, Haifa Is-rael
27
Large file size (disappeared in the paper)■ File size 256KB ~ 1MB■ Subdirectories 1000■ # of files 10,000■ # of transactions 10,000
Postmark Benchmark Result (3)
Org-LFS SSR-LFS Ext3-FUSE0
50100150200250300350400450500
SSDEx
ecut
ion
time
Org-LFS SSR-LFS Ext3-FUSE0
200
400
600
800
1000
1200
1400
HDD
Exec
utio
n tim
e
SYSTOR2010, Haifa Is-rael
28
Statistics of IO-TEST
Org-LF
S(Clea
ning)
Org-LF
S(Plug
ging)
SSR-LFS
02000400060008000
1000012000140001600018000
HDD (Utilizaton 85%)
writtenread
Meg
a by
tes
Org-LF
S(Clea
ning)
Org-LF
S(Plug
ging)
SSR-LFS
02000400060008000
1000012000140001600018000
SSD (Utilizaton 85%)
writtenread
Meg
a by
tes
SYSTOR2010, Haifa Is-rael
29
IO TEST benchmark ■ 1st stage
Creates 64KB files until 90% utilization Randomly deletes files until target utilization
■ 2nd stage Randomly updates up to 4GB of file capacity Measures the elapsed time in this stage LFS has no free segments Including cleaning or SSR cost
Random Update Performance
Disk 90%(High threshold)
20%(desired threshold)
Create files
Delete filesRandom update files
SYSTOR2010, Haifa Is-rael
30
Experimental Environment
Type Specifics
OS Ubuntu 9.10 (linux-2.6.31.4)
Processor AMD 2.1GHz Opteron 1352
RAM SAMSUNG 4GB DDR-2 800Mhz ECC
SSD INTEL SSDSA2SH032G1GN
HDD SEAGATE ST3160815AS
SYSTOR2010, Haifa Is-rael
31
Comparison of cleaning and SSR scheme
Tseg
writeForeground
Background
TimeFree Segments
Write Request
Tback-ground
1
T1 T3
0
Tseg
write
1
Ton-demand
cleaning
0 1 2 3
delay
T2
x Tidle
Tseg
writeForeground
Background
TimeFree Segments
Tssr
SSR
Write Request
Tback-ground
0 1 2
T1 T2 T3
0
Tseg
write
1 3
Tidle
Cleaning case :
SSR case :
SYSTOR2010, Haifa Is-rael
32
To measure the performance impact of Lazy Indirect Block Update 1 worker - File size 1GB, Request size 4KB, Sync I/O mode SSR-LFS outperforms Org-LFS and Ext3-FUSE for sequential and ran-
dom writes on both devices
FIO Benchmark Result (1)
Seq-write rand-write seq-read rand-read0
20
40
60
80
100
120
140
SSD
Org-LFSSSR-LFSExt3-FUSE
Thro
ughp
ut(M
B/s)
Seq-write rand-write seq-read rand-read0
10
20
30
40
50
60
70
80
HDD
Org-LFSSSR-LFSExt3-FUSE
Thro
ughp
ut(M
B/s)
SYSTOR2010, Haifa Is-rael
33
4 workers - File size 1GB, Request size 4KB, Sync mode The LIBU scheme has greater impact on performance when four work-
ers are running
FIO Benchmark Result (2)
Seq-write rand-write seq-read rand-read0
20
40
60
80
100
120
140
SSD
Org-LFSSSR-LFSExt3-FUSE
Thro
ughp
ut(M
B/s)
Seq-write rand-write seq-read rand-read0
10
2030
405060
708090
HDD
Org-LFSSSR-LFSExt3-FUSE
Thro
ught
put(M
B/s)