Date post: | 19-May-2015 |
Category: |
Technology |
Upload: | udo-seidel |
View: | 402 times |
Download: | 0 times |
Quo vadis Linux File Systems Ext4 or BTRFS
Udo Seidel
OSDC 2011 2
Agenda
● Introduction/motivation● ext4 – the new member of the extfs family
● Facts, specs● Migration
● BTRFS – the newbie .. the hope● Facts, specs● Migration
● Summary
OSDC 2011 3
Linux file systems
● More than 50 file systems shipped with Linux kernel● Local● Remote● Cluster● ...
● A few as standard for root directory● ext2, ext3● XFS
OSDC 2011 4
Linux file systems – challenges
● ReiserFS sun-setted● Limitations of ext3● Changes in recent Enterprise distributions
OSDC 2011 5
Linux file systems – new players
● New version of the ext family -> ext4● Marked as stable● Shipped with Enterprise distributions
● New approach with BTRFS● Still experimental● Default by some projects, e.g. MeeGo
OSDC 2011 6
4th extended file system
● Shipped since 2.6.19● Stable since 2.6.28● To overcome limits of ext3
● Size● Performance
OSDC 2011 7
Ext4 - history
● Successor of ext3● Started as set of patches for ext3● Later forked
● First called ext3dev (sometimes ext4dev)● Not impact ext3 stability● Less dependencies to ext3 code● Easier to maintain source code
OSDC 2011 8
Ext4 - facts
● Max volume size: 1 EByte = 1024 PByte ● Max file size: 16 TByte● Max length of file name: 256 Bytes● Support of extended attributes● No encryption● Not really compression● Partially 64bit
OSDC 2011 9
Ext4 – starting from known
● Known tools● mkfs● fsck● tune2fs● e2label
OSDC 2011 10
Ext4 – global structure I
● Entry point -> superblock● Block size● Number of blocks and inodes● Number of free blocks and inodes
● Disk divided in block groups● backup of superblock ● Block group description (inode/block bitmaps)
OSDC 2011 11
Ext4 – global structure II
● Similar to ext3● Inherits some ext3 limitations
● Number of inodes per block group
● 2nd type of block groups => flexible ● Flexible placement of bitmaps
● Bigger inodes to store additional information● 256 Bytes● Nano second time stamps
OSDC 2011 12
Ext4 – from blocks to extents
● Common addressing for modern file systems● Contiguous area of blocks
● Less management information needed● Less meta data operations● Less “fragmentation”
● Requires change of on-disk format
OSDC 2011 13
Ext4 – extent I● 15 bit for extent size
● Block size of 4 KByte => 128 MByte
● 1 bit for extent initialization information
struct ext4_extent {
__le32 ee_block; /* first logical block extent covers */
__le16 ee_len; /* number of blocks covered by extent */
__le16 ee_start_hi; /* high 16 bits of physical block */
__le32 ee_start_lo; /* low 32 bits of physical block */
};
OSDC 2011 14
Ext4 – extent II
● 32 bit for block addresses inside file● Block size of 4 KByte => 16 TByte
● 48 (!) bit for block addresses of file system● Block size of 4 KByte => 1 EByte
OSDC 2011 15
Ext4 – extent III
● 60 Byte for extent information● 12 Byte for extent header● 12 Byte for extent structure
– Up to 4 extents per inode – max. 512 MByte direct addressable (ext3: 48 KByte)– Different schema for bigger files
OSDC 2011 16
Ext4 – extent tree I
● For files > 512 MByte● B+ tree● Extent structure only at leaf nodes ● New element: extent index
● Same header structure like data extent ● Points to data block● Data block contains either extent index or extent
structure
OSDC 2011 17
Ext4 – extent tree II
OSDC 2011 18
Ext4 – from extents to blocks
● At the end block allocation● New features
● Multi-block allocation● Delayed allocation● Persistent allocation
OSDC 2011 19
Ext4 – multi-block allocation
● Ext3: only one block● 12800 calls for 50 MByte file
● Ext4: multiple blocks per call● Less overhead● Contiguous physical location of data
OSDC 2011 20
Ext4 – delayed allocation
● Ext3● Instant block allocation● Fragmentation due to buffers and caches
● Ext4● Delayed block allocation● Use cache information for placement● Risk of data loss in early versions => improved
since 2.6.30
OSDC 2011 21
Ext4 – “clever” allocation
● Support of system call fallocate()● Application reserves blocks ahead● File system ensures disk space availability
● Allocation information in extent structure● Remember 16th bit
OSDC 2011 22
Ext4 – consistent status
● New journaling => JBD2● Transactions have checksums● 64 bit ready● Deactivation possible
OSDC 2011 23
Ext4 – repair
● Improved fsck()● No check of unused blocks
– information stored in block group header– Information secured via checksums– (de)activation possible at any time
● First run as slow like in ext3
OSDC 2011 24
Ext4 – other news
● Nano second precision time stamps● Unix millennium bug shifted to 2514
● More subdirectories● Up to 65000● More than 65000 ... with limitation
OSDC 2011 25
Ext4 – general migration paths
● mkfs() and backup/restore● Clean new file system structure● Only way for file systems other than ext2/3● Extended outage
● Conversion via tune2fs● Partial only● Only possible for ext family● Faster/easier
OSDC 2011 26
Ext4 – background for migration
● 2 kind of changes compared to ext3● change of ondisk format:
– Extents– Only enabled for new files via tune2fs– Additional tasks needed
● Ondisk format not relevant– block allocation– Immediately enabled via tune2fs
OSDC 2011 27
Ext4 – migration via tune2fs
● Results in mix of ext3 and ext4 structure● Access via ext3 driver impossible● fsck() needed
parameter description
extent Extent based block allocation
flex_bg Flexible placement of meta data
uninit_bg Flag uninitialized blocks for faster fsck
dir_nlink Infinite number of sub directories
extra_isize Timestamps with nano seconds
OSDC 2011 28
Ext4 – migration hints
● fsck() recommended● /boot – booting from ext4 possible?● Rescue media enabled for ext4?
OSDC 2011 29
Ext4 – summary
● Good successor of ext3● Manages higher amount of data● Faster
● Performance● recovery
● Safer● Sufficient migration options from ext2/3
OSDC 2011 30
Better/b-tree file system
● Shipped since 2.6.29● Still experimental● Replace ext3/4● New storage management approach
OSDC 2011 31
BTRFS - history
● Basic idea ● Shown 2007● Usage of B trees for standard structures● Not new ... see XFS, ReiserFS
● Chris Mason● Worked on ReiserFS for SUSE● Moved to Oracle -> started BTRFS developement
OSDC 2011 32
BTRFS - facts
● Max file/volume size: 16 EByte ● Max length of file name: 256 Bytes● Support of
● Extended attributes● Encryption● Compression● Snapshot● Copy-on-Write
OSDC 2011 33
BTRFS – global structure
● Entry point -> superblock● More than one file system per volume● Extents
● Put together in block groups● No mix of data and meta data
OSDC 2011 34
BTRFS – internals: the trees
● Consists of B+ trees● Root tree● File system tree● Extent allocation tree● Checksum tree● Log tree● Chunk & device tree● Data relocation tree
OSDC 2011 35
BTRFS – internals: structures
● 3 structures● Key
– index of the tree structure● Block header
– ID of file system– Reference of insert time– Level position
● Item– Different types: inodes, extents, directories
OSDC 2011 36
BTRFS – internals: the key
● Index of the tree structure● Size: 136 bit● First 64 bit: unique object ID● Next 8 bit: type/item● Last 64 bit: item dependent
● e.g. Hash of directory name● e.g. Number of elements in directory● e.g. object ID of upper layer directory
OSDC 2011 37
BTRFS – internals: the item
● More than one item per object ID possibleItem Value
INODE_ITEM 1
XATTR_ITEM 24
DIR_ITEM 84
DIR_INDEX 96
EXTENT_DATA 108
EXTENT_CSUM 128
ROOT_ITEM 132
EXTENT_ITEM 168
OSDC 2011 38
BTRFS – more about trees
● Highest layer● Root tree● Referenced in superblock● Other trees => object ID in root tree
● Some trees unique● Extent allocation● Data relocation
● Possibly multiple trees● File system
OSDC 2011 39
BTRFS – file system tree
● Visible part● Contains:
● Inode items ● Reference items
● No data of files ● See extents● Exception: small files
OSDC 2011 40
BTRFS – extent allocation tree
● Space management● Backward reference
● file system object ● Possibly multiple per extent● Maybe move to extent data reference object
OSDC 2011 41
BTRFS – other trees
● Log tree● Collects fsync() calls● Journal of this kind of COW calls
● Checksum tree● CRC32 checksums of data and meta data
● Chunk tree● Manage devices: device item and chunk map item
● Device tree● Counterpart of chunk tree
OSDC 2011 42
BTRFS – device management
● Included volume manager ● pool concept● RAID-0 and RAID-1
● For data and meta data● Not necessarily identical
● Chunk tree● abstract from disk block
OSDC 2011 43
BTRFS – extents, chunks, blocks
OSDC 2011 44
BTRFS – what else
● Transparent compression via zlib● Support of POSIX ACL's● Online grow/shrink● Online add/removal of disks● No fsck() tool (yet)● Management tool evolution (btrfsctl -> btrfs)
OSDC 2011 45
BTRFS – migration I
● Via tool btrfs-convert● du/df not fully BTRFS-aware● In place from ext3/4
● Via libe2fs ● BTRFS meta data location flexible● Old ext3/4 organized in snapshot● Roll-back possible to date/time of conversion
OSDC 2011 46
BTRFS – migration II
OSDC 2011 47
BTRFS summary
● Still experimental● Meets standard file systems requirements● Bridges existing gaps
● e.g. snapshots
● easy migration from ext3/4 possible● New approach to storage management
● e.g. included volume manager
OSDC 2011 48
Summary
● Improvement moving to ext4● Safe switching to ext4● In place migration from ext3 possible● Future is BTRFS● In place migration from ext3/4 to BTRFS
possible
OSDC 2011 49
References
● http://ext4.wiki.kernel.org● http://btrfs.wiki.kernel.org
OSDC 2011 50
Thank you!