Date post: | 02-Mar-2018 |
Category: |
Documents |
Upload: | dinhkhuong |
View: | 219 times |
Download: | 4 times |
1
NOVA: A High-Performance, Hardened File System for Non-Volatile Main Memories
Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Andy Rudoff (Intel), Steven Swanson
Non-Volatile Systems LaboratoryDepartment of Computer Science and EngineeringUniversity of California, San Diego
2
NVDIMM Usage Models• Legacy File IO Acceleration – fast and easy
– Run existing IO-intensive apps on NVDIMMs– “just works”– NOVA is 30% - 10x faster than Ext4 for write intensive
workloads.– Need strong protections on data.
• DAX Mmap -- maximum speed + programming challenges– Load-store access– You still need a strongly-consistent file system
• File system corruption can still destroy your data• NOVA is strongly consistent
– Data protection is still critical
0
50
100
150
200
250
300
350
400
450
Ops
per
seco
nd (x
1000
)
Legacy IO Throughput
Ext4-datajournal NOVA
3
XFS
EXT4
F2FS
BTRFS
NILFS
4
Disk-based file systems are inadequate for NVMM
• Disk-based file systems cannot exploit NVMM performance
• Performance optimization compromises consistency on system failure [1]
[1] Pillai et al, All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI '14.
Atomicity Data Protection
1-Sector overwrite
1-Sector append
1-Block overwrite
1-Block append
N-Block overwrite
N-Block append Data Meta-
dataSnap-shots
Ext4wb ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✓
Ext4Order ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓
Ext4Dataj ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✓
Btrfs ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓
xfs ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓
Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓
5
XFS-DAXEXT4-DAX
BPFS SCMFS PMFS Aerie
M1FS
6
NVMM file systems don’t provide strong consistency or data protection
• DAX does not provide data atomicity guarantees
• Programming is more difficult
Atomicity Data Protection
Metadata Data Data Meta-data
Snap-shots
BPFS ✓ ✓ ✗ ✗ ✗
PMFS ✓ ✗ ✗ ✗ ✗Ext4DAX ✓ ✗ ✗ ✓ ✗
XFSDAX ✓ ✗ ✗ ✓ ✗
SCMFS ✗ ✗ ✗ ✗ ✗
Aerie ✓ ✗ ✗ ✗ ✗
7
NOVA provides strong atomicity guarantee
Atomicity Data Protection
Metadata Data Data Meta-data
Snap-shots
BPFS ✓ ✓ ✗ ✗ ✗
PMFS ✓ ✗ ✗ ✗ ✗Ext4DAX ✓ ✗ ✗ ✓ ✗
XFSDAX ✓ ✗ ✗ ✓ ✗
SCMFS ✗ ✗ ✗ ✗ ✗
Aerie ✓ ✗ ✗ ✗ ✗
Atomicity Data Protection
1-Sector overwrite
1-Sector append
1-Block overwrite
1-Block append
N-Block overwrite
N-Block append Data Meta-
dataSnap-shots
Ext4wb ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✓
Ext4Order ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓
Ext4Dataj ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✓
Btrfs ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓
xfs ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓
Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓
8
NOVA provides strong atomicity guarantee
Atomicity Data Protection
Metadata Data Data Meta-data
Snap-shots
BPFS ✓ ✓ ✗ ✗ ✗
PMFS ✓ ✗ ✗ ✗ ✗Ext4DAX ✓ ✗ ✗ ✓ ✗
XFSDAX ✓ ✗ ✗ ✓ ✗
SCMFS ✗ ✗ ✗ ✗ ✗
Aerie ✓ ✗ ✗ ✗ ✗
NOVA ✓ ✓ ✓ ✓ ✓
Atomicity Data Protection
1-Sector overwrite
1-Sector append
1-Block overwrite
1-Block append
N-Block overwrite
N-Block append Data Meta-
dataSnap-shots
Ext4wb ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✓
Ext4Order ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓
Ext4Dataj ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✓
Btrfs ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓
xfs ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓
Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓
NOVA ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
9
NOVA’s Key Features
• Features– High-performance– Strong Consistency– Snapshot support– Data protection
• Usage Models– open()/close(), read()/write()– DAX-mmap()
10
NOVA’s Architecture
11
Log Structure + copy-on-write + Journals
• One log per iNode• Non-contiguous• Fast, Simple atomic
updates• Meta-data only
File log
Tail Tail
Core NOVA Structures
12
Log Structure + copy-on-write + Journals
• Multi-page atomic update
• Fast allocation• Instant data GC
File log
Tail
Data 1 Data 2
Tail
Data 0 Data 1
Core NOVA Structures
13
Log Structure + copy-on-write + Journals
• Small, fixed sized journals
• For complex ops.File log
Directory log
Tail
TailTail
Tail
Dir tail
File tail
Journal
Core NOVA Structures
14
Supporting Backups with Snapshots
15
Snapshots for Normal File Access
0Current epoch
0File log
Data
0x1000
Snapshot entryData in snapshot
File write entryReclaimed data
Epoch IDCurrent data
Snapshot 0
1
1
Data
0x2000
Data
1
Data
0x3000
Data
Snapshot 1
2
Data
2
Data
0x4000
Data
16
False
?
V = True;D = 1;
Corrupt Snapshots with DAX-mmap()
R/W RO PageFault
Copy on Write
ValueChange
Application:
Page hosting D:
Page hosting V:
?
T
TimeSnapshot
Snapshot
True
1
• Recovery invariant: if V == True, then D is valid– Incorrect: Naïvely mark pages read-only one-at-a-time
17
False
?
D = 1;
Consistent Snapshots with DAX-mmap()
R/W RO PageFault
ValueChange
Application:
Page hosting D:
Page hosting V:
?
Time
Snapshot
V = True;
True
1
• Recovery invariant: if V == True, then D is valid– Correct: Block page faults until all pages are read-only
ROBlocking
F
Copy on Write
Snapshot
18
• Normal execution vs. taking snapshots every 10s– Negligible performance loss through read()/write()– Average performance loss 6.2% through mmap()
Performance impact of snapshots
Conventional workloads NVMM-aware workloads from WHISPER
19
Data Protection: Metadata
20
NVMM Failure Modes: Media Failures• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• May consume corrupted data
• Software scribbles– Kernel bugs or own bugs– Transparent to hardware
Software:
NVMM Ctrl.:
Read
NVMM data:
Detects & corrects errors
Consumes good data
Media error
21
NVMM Failure Modes : Media Failures• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• May consume corrupted data
• Software scribbles– Kernel bugs or own bugs– Transparent to hardware
NVMM data:
Software:
NVMM Ctrl.: Detects uncorrectable errorsRaises exception
Receives MCE
Media error &Poison Radius (PR)e.g. 512 bytes
Read
22
Detecting NVMM Media Errors
Recoverable
Unrecoverable
Kernel
User
Yes
No
memcpy_mcsafe()• Copy data from NVMM• Catch MCEs and return failure
Whose access?
Handler registered?
Process and return
Kernel panic
SIGBUS
Kernel panic
MCE
23
NVMM Failure Modes : Media Failures• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• Consume corrupted data
• Software scribbles– Kernel bugs or own bugs– Transparent to hardware
NVMM data:Media error
Software:
NVMM Ctrl.: Sees no error
Consumes corrupted data
Read
24
NVMM Failure Modes: Scribbles• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• Consume corrupted data
• Software “scribbles”– Kernel bugs or NOVA bugs– NVMM file systems are highly vulnerable
NVMM data:
Software:
NVMM Ctrl.: Updates ECC
Bug code scribbles NVMM
Scribble error
Write
25
NVMM Failure Modes: Scribbles• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• Consume corrupted data
• Software “scribbles”– Kernel bugs or NOVA bugs– NVMM file systems are highly vulnerable
NVMM data:
Software:
NVMM Ctrl.: Sees no error
Consumes corrupted data
Scribble error
Read
26
Head’ Tail’ csum’
Head TailHead Tail csum
Head’ Tail’ csum’ H1’ T1’
Head Tail csum H1 T1
• Replicate everything– Inodes– Logs– Superblock– …
• CRC32 Checksums everywhereent1’ c1’ entN’ cN’…
NOVA Metadata Protection
inode
ent1 c1 entN cN…
Data 1 Data 2
inode’
27
Defense Against Scribbles
• Tolerating Larger Scribbles– Allocate replicas far from one another– Can tolerate arbitrarily large scribbles to metadata.
• Preventing scribbles– Mark all NVMM as read-only– Disable CPU write protection while accessing NVMM
28
Data Protection: Data
29
• Divide 4KB blocks into 512-byte stripes
• Compute a RAID 5-style parity stripe• Compute and replicate checksums for each stripe
NOVA Data Protection
S0 S1 S2 S3 S4 S5 S6 S7 P
1 Block
P = ⊕ S0..7
512-Byte stripe segments
Ci = CRC32C(Si)Replicated
30
• With DAX-Mmap(), file data changes are invisible to NOVA• NOVA cannot protect mmap’ed file data• NOVA logs mmap() and restores protection on munmap() or
recovery
File data protection with DAX-mmap
File data:
File log:
NOVA: read(), write()
Applications:
Kernel-space
NVDIMMs
User-space
mmap()
load/storeload/store
protectedunprotected
mmap log entry
31
• NOVA cannot protect mmap’ed file data– User applications directly load/store the mmap’ed region– NOVA has to know what file pages are mmap’ed
File data protection with DAX-mmap
File data:
File log:
NOVA: read(), write()
Applications:
Kernel-space
NVDIMMs
User-space
mmap()
munmap()
Protection restored
load/store
32
• NOVA cannot protect mmap’ed file data– User applications directly load/store the mmap’ed region– NOVA has to know what file pages are mmap’ed
File data protection with DAX-mmap
File data:
File log:
NOVA: read(), write()
Applications:
Kernel-space
NVDIMMs
User-space
mmap()
System Failure + recovery
33
Performance
34
Performance Cost of Data Integrity
0
0.2
0.4
0.6
0.8
1
1.2
Fileserver Varmail Webproxy Webserver RocksDB MongoDB Exim TPCC average
xfs-DAX ext4-DAX ext4-dataj Fortis baseline w/ MP+WP w/ MP+DP+WP
35
Conclusion
• Existing file systems do not meet the requirements of applications on NVMM file systems
• NOVA’s multi-log design achieves high performance and strong consistency
• NOVA’s data protection features ensure data integrity• NOVA outperforms existing file systems while providing
stronger consistency and data protection guarantees
36
Thank you!
Try NOVA!https://github.com/NVSL/NOVA
37
Backup Slides
38
Protecting Against Scribbles
• Metadata allocator separates metadata replicas– Allocate primary and replica pages in opposite directions– Use allocator ‘dead-zone’ to guarantee minimal distance– Protect against scribbles from other kernel bugs and own bugs
log1Simple allocation:
A page-sized scribble can affect most pairs of replicated metadata pages
log1’
log1Two-way allocation: log2’log2 log1’logN logN’
log2 log2’ logN logN’
log1Dead-zone allocation: log2’log2 log1’logN logN’1 MB
A page-sized scribble can affect limited pairs of replicated metadata pages
A scribble less than 1 MB can not corrupt any metadata
39
Minimize the chance of corruptions –x86 write protection
• Leverage x86 CPU’s write protection– CR0.WP disables/enables writing to RO memories of each x86 core– Only enable writing when NOVA writes to NVMM– Protect against scribbles from other kernel bugs, not own bugs
40
Filebench throughput
• NOVA achieves high performance with strong data consistency
0
50
100
150
200
250
300
350
400
450
Fileserver Varmail Webproxy Webserver
Ops
per
seco
nd (x
1000
)
Filebench throughput
Ext4-datajournal Ext4-DAX m1fs NOVA
41
• Update tails of primary inode• Update csum of primary inode• Same procedure for inode’
Tick-tock inode update
Secondary
Primary Head’ Tail’ csum’ H1’ T1’
Head Tail csum H1 T1
Old Updating New
42
Performance Cost of Data Integrity
0
0.2
0.4
0.6
0.8
1
1.2
Fileserver Varmail Webproxy Webserver RocksDB MongoDB Exim TPCC average
xfs-DAX ext4-DAX ext4-dataj Fortis baselinew/ MP w/ MP+WP w/ MP+DP w/ MP+DP+WP
43
Conclusions• NVMM file systems need unique solutions for reliability
– Error reporting mechanisms different than disks– DAX-mmap complicates designs
• Performance and storage penalties vary– Storage cost is modest for the presented hardening techniques– Performance impact is significant for some applications
• More knowledge is necessary to determine the trade-offs– Uncorrectable media errors in emerging NVMM technologies– The frequency and size of scribbles in kernel space
• NOVA provides all hardening techniques as mount options
44
Performance impact of data integrity
• File operation latency
45
Reliability evaluation – metadata pages at risk
• Scan an aged NOVA file system image– Examine distances
between the primaryand replica pages
– Count vulnerablepage pairs for a givenscribble size
Y == 0 points do notshow in log-log plot
46
PMFS shortcomings
• No data atomicity support• High consistency overhead with persistent B-tree• Not scalable
– Directory operations (linear search)– NVMM allocation (Single allocator)– Single journal shared by all transactions
• Poor performance on large directories• Intel has deprecated PMFS
47
Ext4-DAX and xfs-DAX shortcomings
• No data atomicity support• Single journal shared by all the transactions (JBD2-based)• Poor performance
48
Non-volatile main memory is about to happen
NVMM needs a new file system: PMFS, Ext4-DAX, SCMFS, Aerie, NOVA, …
49
Why a new file system?
Source: Memory-Driven Computing, Kimberly Keeton, HP Labs
We need to reduce the software overhead.
50
What Should a File System Provide?• Performance current focus (of all known efforts)• Consistency
– Atomic metadata operations– Atomic file updates
• Data Protection– Snapshots– Media error protection– Software error protection
• Cost optimizations– Compression– Deduplication
We need to study the impact of adding more file system services in the context of NVMM.
51
Evaluation: Latency
• Intel PM Emulation Platform– Emulates different NVM
characteristics– Emulates clwb/PCOMMIT
latency• NOVA provides low latency
atomicity0
5
10
15
20
25
Create Append (4KB) Delete
Late
ncy
(mic
rose
cond
)
Operation latency
Ext4-datajournal Ext4-DAX m1fs NOVA
52
NOVA design and in-NVMM data layout
DRAMNVMM
Journal
Inode table
Free list
CPU 0
Journal
Inode table
Free list
CPU 1
Head TailInode
Inode log
Superblock
Recoveryinode
• High performance– No page cache– Memory semantics– Segregated data structures– Per-CPU freelist– Per-inode logging
• Strong consistency– Copy-on-write file data– Using 8-byte atomic stores
...
53
NOVA design and in-NVMM data layout
File log
Data 1 Data 2Data 0 Data 1
Head TailInode
Inode table
Per-CPU inode table
Per-inode log
• High performance– No page cache– Memory semantics– Segregated data structures– Per-CPU freelist– Per-inode logging
• Strong consistency– Copy-on-write file data– Using 8-byte atomic stores writing to page 1 and 2
64-bit tail ptr
54
NVMM file systems should support snapshot
• Snapshot is essential for file system backup• Available in file systems for block devices
– ZFS, Btrfs, WAFL
• NOVA is the first NVMM file system providing snapshot– Efficient full-filesystem snapshot at minimal performance cost– Creating/deleting snapshots does not halt file system– Creating consistent snapshots with DAX-mmap enabled
55
Enable snapshot in NOVA
• Maintain a current ‘epoch_id’ for the file system– Stored in the superblock– Incremented after every snapshot taken
• Add the ‘epoch_id’ to each log entry epoch_id
epoch_id
56
Taking snapshots
0Current epoch
0File log
Data
0x1000
Snapshot entryData in snapshot
File write entryReclaimed data
Epoch IDCurrent data
Snapshot 0 log
Snapshot 1 log
Snapshot 0
1
1
Data
0x2000
Data
0x1000, 1
1
Data
0x3000
Data
Snapshot 1
2
Data
2
Data
0x4000
Data
0x3000, 2
57
Deleting snapshots
0Current epoch
0File log
Data
0x1000
Snapshot entryData in snapshot
File write entryReclaimed data
Epoch IDCurrent data
Snapshot 0 log
Snapshot 1 log
Snapshot 0
1
1
Data
0x2000
Data
0x1000, 1
1
Data
0x3000
Snapshot 1
2
Data
2
Data
0x4000
Data
0x3000, 2
Data DataDataData
Background GC
58
Mounting snapshots
0Current epoch
0File log
Data
0x1000
Snapshot entryData in snapshot
File write entryReclaimed data
Epoch IDCurrent data
Snapshot 1 log1
1
Data
0x2000
Data
1
Data
0x3000
Snapshot 1
2
Data
2
Data
0x4000
Data
0x3000, 2
Data
log tail
DataData
59
• Goal: Applications take snapshots and keep running– Virtual addresses do not change– Consistency must be guaranteed
• How: Set each mmap’ed page as read-only– Then do copy-on-write for new stores (detected by page fault)
• Caveat: Can only atomically set one page as read-only– What if the order of becoming read-only conflicts with consistency?
Snapshots with DAX-mmap
60
NOVA (meta)data integrity features• Detect (meta)data corruptions
– Media errors: error code from memcpy_mcsafe(), and checksums– Software scribbles: checksums
• Repair (meta)data corruptions– Metadata: Fully replicated– File data: Stripe and parity-code each block
• Minimize scribbles– Leverage x86 CPU’s write protection (CR0.WP)– Metadata allocators separate replicas
61
Metadata error detection and correction
• Use inode access as an example:
• If any step raises an error:– Attempt to repair and retry– If recovery fails: return –EIO to user
Read inodememcpy_mcsafe
Read inode’memcpy_mcsafe
Verify inode csumVerify inode’ csum
memcmp(inode, inode’) Use inode
62
• If any step raises an error:– Attempt to repair and retry– If recovery fails: return –EIO to user
File data error detection and correction
Read a stripmemcpy_mcsafe
Verify strip’s checksum Copy data to userCheck errors in
checksums
63
• With DAX-Mmap(), file data changes are invisible to NOVA
File data protection with DAX-mmap
File data:
File log:
NOVA: read(), write()
Applications:
Kernel-space
NVDIMMs
User-space
mmap()
Following dax-mmap() semantics, NOVA doesn’t interfere with mmap’ed file data.
load/store
protectedprotected
64
• NOVA cannot protect mmap’ed file data– User applications directly load/store the mmap’ed region– NOVA has to know what file pages are mmap’ed
File data protection with DAX-mmap
File data:
File log:
NOVA: read(), write()
Applications:
Kernel-space
NVDIMMs
User-space
mmap() vm area list:
NOVA skips protection routines for reads and writes to the regions found in the vmarea list.
load/storeread() load/store
no protection
65
Performance impact of data integrity
• Latency breakdown on NVDIMM-N
metadata protection
file data protection
x86 write protection
66
Performance impact of data integrity
• Random read/write bandwidth on NVDIMM-N
67
Storage utilization with data integrity
• Conceptual view of NOVA’s utilization of NVMM
• Actual usage of a practical workload: fileserver
Dead-zone (DZ) only virtually exists to separate metadata replicas.File data can still live inside.
68
Differences from disk FS implementation• Low-latency storage media
– Need to choose fast methods for any involved computation• Fine-grained random access
– Need fine-grained checksum ranges, not per block (as in Btrfs, ZFS)• Small atomicity guarantees (64-bit)
– Need metadata replication to assist consistent updates• Media errors cause machine check exceptions (MCEs)
– Need awareness and mitigations• Demands for DAX-mmap (no copy-on-write, no FS control)
– Need awareness and lowering the protection level on demand
69
• Snapshot metadata list resides in DRAM to reduce consistency overhead
• Clean unmount:– Finish background snapshot delete– Save snapshot lists to NVMM
• Power failure:– Snapshot transaction ID is persistent– Rebuild snapshot metadata lists during power failure recovery
Recovery for snapshot metadata
70
NVMM (Meta)data corruptions• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• May consume corrupted data
• Software scribbles– Kernel bugs or own bugs– Transparent to hardware
Software
Hardware Read
NVMM data:
HW ECC
Media error
Good data
71
NVMM (Meta)data corruptions• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• May consume corrupted data
• Software scribbles– Kernel bugs or own bugs– Transparent to hardware
Software
Hardware
NVMM data:
HW ECC
Media error &Poison Radius (PR)e.g. 512 bytes
Read
User mode
SIGBUS
Kernel mode
Might panic
MCE
72
NVMM (Meta)data corruptions• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• Consume corrupted data
• Software scribbles– Kernel bugs or own bugs– Transparent to hardware
Software
Hardware
NVMM data:
HW ECC
Media error
Corrupted data
Read
73
NVMM (Meta)data corruptions• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• Consume corrupted data
• Software scribbles– Kernel bugs or own bugs– Transparent to hardware
Software
Hardware
NVMM data:
74
NVMM (Meta)data corruptions• Media errors
– Detectable & correctable• Transparent to software
– Detectable & uncorrectable• Affect a contiguous range of data• Raise machine check exception (MCE)
– Undetectable• Consume corrupted data
• Software scribbles– Kernel bugs or own bugs– Transparent to hardware
Software
Hardware
NVMM data:
HW ECC
Corrupted data
Read
75
Metadata error detection and correction
• Use inode access as example
Read inodememcpy_mcsafe MCE
Read inode’memcpy_mcsafe
Read inode’memcpy_mcsafe
OK
MCE
-EIO to user
MCEPR(inode)PR(inode’)
Verify inode csumVerify inode’ csum
OK
PR(inode)PR(inode’)
OK
Good inodeError inode
memcmp(inode, inode’)
Goto
-EIO to user
inodeinode’
Both fail
One fails
Both OK
neq
eq
Continue
76
File data error detection and correction
Read a stripmemcpy_mcsafe
MCE
OK
Calculate csum
Read other stripsand the parity
-EIO to user
AnyMCE
AllOK
Copy data to user
Good csumError csum
csum == csum0 orcsum == csum1 ?
No
Repair bad strip& verify csums
csum0 == csum1?
YesSuccess
Fail
Yes
No
Judged by the majority:csum, csum0, csum1
• Detect and repair both data and checksum errors