1 SQCK: A Declarative File System Checker Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C....

Post on 13-Dec-2015

214 views 0 download

Tags:

transcript

1

SQCK: A Declarative File System Checker

Haryadi S. Gunawi, Abhishek Rajimwale,

Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

University of Wisconsin – Madison

OSDI ’08 – December 9th, 2008

2/25

Corrupt file systems File systems

Store massive amounts of data Must be reliable

Corrupted file system images Due to hardware errors, file system bugs, etc. Need to be repaired a.s.a.p.

3/25

Who should repair? Does journaling (write-ahead log) help?

No, only for crashes

Does file system repair itself online? No, not enough machinery

Fsck: the last line of defense It’s a “must have” utility

− XFS: “no need fsck ever”, but deploys fsck at the end Must be fully reliable

4/25

But … fsck is complex Fsck has a big task

Turn any corrupt image to a consistent image E.g. check if a data block is shared by two inodes

How are they implemented? Written in C hard to reason about Large and complex

− Ext2 fsck: 150 checks in 16 KLOC− XFS fsck: 340 checks in 22 KLOC

Hundreds of cluttered if-check statements

Bottom line: fsck code is “untouchable”

5/25

Two Questions

Are current checkers really reliable?

If not, how should we build robust checkers?

6/25

e2fsck is unreliable

Analyze e2fsck (ext2 file system checker)

Findings: Inconsistent repair

− The file system becomes unreadable Consistent but not “correct”

− Fsck deletes valid directory entries− Fsck loses a huge number of files

7/25

SQCK Lesson: Complexity is the enemy of reliability

Big task + bad design complexity unreliability Need a higher-level approach for simplicity

SQCK (SQL-based Fsck) Use a declarative query language to write checks Put simply: write fewer lines of code

Evaluation Simple and reliable: e2fsck in 150 queries (vs. 16 KLOC of C) More: Great flexibility and reasonable performance

8/25

Outline Introduction

Analysis of e2fsck

SQCK Design

SQCK Evaluation

Conclusion

9/25

Methodology E2fsck task: cross-check all ext2 metadata

An indirect pointer should not point to the superblock A subdir should only be accessible from one

directory

Inject single corruption Observe how e2fsck repairs a single corruption Only corrupt on-disk pointers

− Corrupt an indirect pointer to point to the superblock− Corrupt a directory entry to point to another directory

Usually, a corrupt pointer is simply cleared to zero

10/25

Inconsistent (Out-of-order) Repair

Inode

*ind

850

851

998999

853

Inode

*ind

Indirect block

0

Superblock

1. Check bad indirect pointer

2. Check indirect content

Ideal fsck

e2fsckInode

*ind

Inode

*ind

Superblock

2. Check indirect content

1. Check bad indirect pointer

0

Superblock

0

0

0

11/25

Consistent but Incorrect Repair (1)

/

a1 b1

a2 b2

Ideal fsck

e2fsck

/

a1 b1

a2 b2

/

a1 b1

a2b2

XLF

/

a1 b1

a2 b2

/

a1 b1

b2

X

Kidnapping problem!

E2fsck does not use all available information

12/25

Result Summary Four problems

Inconsistent Information-incomplete Policy-inconsistent Insecure

E2fsck does not handle all corruptions “Warning: Programming bug in e2fsck! Or some bonehead

(you) is checking a mounted (live) filesystem.”

Not simple implementation bugs Difficult to combine available information Difficult to ensure correct ordering

13/25

Outline Introduction

Analysis

SQCK Design

SQCK Evaluation

Conclusion

14/25

Fsck Properties Hundreds of checks

Complex cross-checks Taxonomy of checks in e2fsck:

Must be ordered correctly

Single instance

Multiple instances

Same structure

63 11

Different structures

12 35

struct A {

int x

int y

}

A {

x

y

}

A {

x

y

}

A {

x

y

}

A {

x

y

}

B {

m

n

}

A { x y}

B { m n}

A { x y}

B { m n}

A { x y}

B { m n}

15/25

A Declarative Approach Lesson: Complexity is the enemy of reliability

SQCK Use a declarative query language (e.g. SQL), why? It is declarative: high-level intent is clear Fit for cross-checking massive information

Goals achieved Simple: e2fsck in 150 queries (vs. 16 KLOC of C) Reliable: Each check/query is easy to understand Flexible: Plug in/out different queries

16/25

Using SQCK Take a fs image

Load metadata to db tables Temporary tables Ex: InodeTable,

GroupDescTable, DirEntryTable

Run checks and repairs (in the form of queries)

Flush any modification, and delete tables

ScannerLoader

File system image

Checks + Repairs

Flush

Database tables

17/25

Declarative check (example 1) Cross-checking a single instance of a structure

“Find block bitmap that is not located within its block group”

first_block = sb->s_first_data_block;last_block = first_block + blocks_per_group;for (i = 0, gd=fs->group_desc; i < fs->group_desc_count; i++, gd++) \{ if (i == fs->group_desc_count - 1) last_block = sb->s_blocks_count; if ((gd->bg_blk_bmap < first_block) || (gd->bg_blk_bmap >= last_block)) { px.blk = gd->bg_block_bitmap; if (fix_problem(BB_NOT_GROUP, ...)) gd->bg_block_bitmap = 0; } ...}

SELECT *FROM GroupDescTable GWHERE G.blockBitmap NOT BETWEEN G.start AND G.end

18/25

Declarative check (example 2) Cross-checking multiple instances of the same

structure

“Find false parents (i.e. directory entries that point to a subdirectory that already belongs to another directory)” Must read all directory entries in dir data blocks Wrong implementation in e2fsck (the kidnapping

problem)

19/25

Declarative check (example 2)if ((dot_state > 1) && (ext2fs_test_inode_bitmap (ctx->inode_dir_map, dirent->inode))) { // ext2fs_get_dir_info // is 20 lines long subdir = e2fsck_get_dir_info (dirent->inode); ... if (subdir->parent) { if (fix_problem(LINK_DIR,..)) { dirent->inode = 0; goto next; } } else { subdir->parent = ino; }}

20/25

Declarative check (example 2)SELECT F.* // returns the // false parent(s)

FROM DirEntryTable P, C, F

WHERE // P says C is its child P.entry_num >= 3 AND P.entry_ino = C.ino AND

// and C says P is his parent C.entry_num = 2 AND C.entry_ino = P.ino AND

// F also says C is its child F.entry_num >= 3 AND F.entry_ino = C.ino AND F.ino <> P.ino AND

F P

C

21/25

Declarative Repairs Running declarative checks is part of the problem

Must also perform the declarative repairs

A repair = An update query Some repairs simply update a few fields

A repair = A series of queries Ex: Reconnect an orphan directory to the lost+found directory Combine a series of queries with C code

− All repairs are written in SQL− C code is only used for connecting them

...SET T.field = newValue, T.dirty = 1

22/25

Outline Introduction

Analysis

SQCK Design

SQCK Evaluation

Conclusion

23/25

SQCK Evaluation Complexity

150 queries in 1100 lines of SQL statements (compared to 16,000 lines of C in e2fsck)

Reliability Pass hundreds of corruption scenarios

Flexibility Add new checks/repairs Enable different versions of e2fsck

Performance Introduce some optimizations

24/25

SQCK vs. e2fsck

Reasonable First generation of

SQCK (with MySQL) Within 1.5x of e2fsck

Future optimizations Hierarchical checks Concurrent queries

25/25

Conclusion Complexity is the enemy of reliability

Recovery code is complex

SQCK: Build recovery tools with a higher-level approach

26

Thank you!Questions?

ADvanced Systems Laboratory www.cs.wisc.edu/adsl