Date post: | 30-Aug-2018 |
Category: |
Documents |
Upload: | dangnguyet |
View: | 227 times |
Download: | 0 times |
Filesystems• ReiserFS
– Hans Reiser led team at Namesys
– First journaling FS for linux
– Out of favor since Hans charged with murdering wife and Namesys folded
• JFS
– Journaling File System from IBM
• ext2, ext3, ext4
– Journaling introduced in version 3
– The “standard” FS for linux
• btrfs is a sophisticated new fs for linux
• FATxx, NTFS, XFS and others also supported
Ext file systems
• ext: Extended File System
– 1992
– Created for the linux kernel
– Based on Unix File System (UFS)
– 2GB, 255 character file names
• ext2
– 2TB
– POSIX ACL's
– Timestamps
Sidetrack: POSIX
• Portable Operating System Interface
• Family of IEEE standards
– Institute of Electrical and Electronics Engineers
• IEEE Std 1003.1-1988
• ISO/IEC 9945
• Richard Stallman suggested the name
• Designed to maintain compatibility between operating systems.
ext3
• Journaling
• Made JFS and ReiserFS unnecessary
– Although ext3 not as fast
• Backward compatible with ext2
– Just add journal file
• Blocksize 1 – 8 KiB
• Filesize 16 GiB – 2 TiB
• FS size 2 TiB – 32 TiB
Journal
• After image log
• Options:
– Data and metadata, written before data
– Metadata only, data written before the journal is marked as committed
– Metadata only, written in any order
ext4
• Stable in linux 2.6.28 (2008)
• Basically a batch of simultaneous updates to ext3
• Motivated partly by problems of large file systems
• FS size 1 EiB(2**60)
• File size 16 TiB
ext4
• Extents replace block mapping for allocation
– Extent = starting block and block count
• Available pre-allocation
• Delayed allocation
• 64000 subdirectories (up from 32000)
• Htree indexes– Indexes on hash of filename– Smaller and faster
• Nanosecond timestamps
Blocks
• Like clusters in FAT file system
• In ext2/3: 1024, 2048 or 4096
– Specify when fs created
• Block groups are large collections of contiguous blocks that partition the FS data structures
– Usually size determined by FS but can be specified
• Size limited by bitmaps so large FS may need multiple block groups
Superblock
• 2 sectors (1024 bytes) that describe the file system
– Volume label
– Block size
– # blocks per group
– # reserved blocks before the 1st block group
– Count of free inodes & blocks (total all groups)
Superblock
• 1st superblock is1024 bytes past the beginning of the file system
• Copies of the superblock are in the first block of each block group
Group Descriptor Table• Stores
– The group descriptors
– One for each block group
– Starting block addresses
– block bitmap
– inode bitmap
– inode table
– Count of free inodes & blocks for the group
• Located in the block after the superblock
– Backup copies are in the same block groups as the superblock backups
Block Bitmap
• One bit per block in the group
– size = #blocks / 8
• Linux creates a block group to have as many blocks as there are bits in a block
• Thus, a block bitmap is always 1 block in size
• Tracks block allocation for the group
Inode Bitmap
• Tracks the allocation of inodes in the group
– Size = #inodes per group / 8
– Size defined at file system creation
• Typically fewer inodes than blocks in group
inodes• Contained in inode table
• Like MFT records for files
• 256 bytes per inode
• Number of inodes determines number of files
• Specified when FS created
• Contain file and directory metadata
• Directory has file or directory name and pointer to inode in the inode table
• Inode points to the file content blocks
inode contents• Type of the file
– plain file, Directory, symbolic link, device file
• Access permissions
• Owner and group ID numbers
• Size in bytes
• Number of links (directory references)
• Times of last access and last modification to the file
• List of data blocks claimed by the file.
• Address of the file's blocks on the disk
Directories
• Special files that associate names with the inode numbers used internally by the file system
• Each entry associates one file name with one inode number
• Consists of:
– inode number,
– length of the file name
– actual text of the file name.
Directory Structure
• Hierarchical
– Like FAT and NTFS
• But all data stored in one tree
– No C: N: etc
• Includes network shares, all physical devices, removable devices and some virtual file systems
Linux File Structure
/
usr mnt varrootbin boot
/dev/fd0
/mnt/floppy /mnt/cdrom
/dev/hdc
/mnt/USB
/dev/sda
File system
Points to a device
References
• https://en.wikipedia.org/wiki/Ext2
• https://en.wikipedia.org/wiki/Ext3
• https://en.wikipedia.org/wiki/Ext4
• https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout– Technical
btrfs
• B-tree file system
• Development started at Oracle 2007
• Now considered stable
• Available in most linux distros
• Default fs in new SuSE releases
• Completely new design
– Ext file systems design encumbered by backward compatibility
– btrfs borrows features from reiserfs
btrfs Features – B-tree
• B-tree
• Specialized search tree optimized for disk indexes
• Self balancing
• “Safe” balancing algorithm
btrfs B-trees
• Everything stored in B-trees of different types
– Accessed by one algorithm/code base
• All trees indexed by one root tree
• Sub-trees for various file system functions
• All use the same index format
– 64 bit object ID
– 8 bit type
– Remaining 64 bits are type-dependent
B-tree Types
• Root tree
• File system tree
– Visible files and directories
– Files stored in extents (or in-tree)
• Extent allocation tree
• Log tree
– Holds a journal
B-tree Types
• Chunk and Device trees
– Chunks are parts of a logical division of the fs space
– Mapped to physical chunks by Chunk tree
– Device tree contains inverse mapping
– Allows mapping to change, eg to add a new device without changing logical view of the fs
• To mount a fs need to find the chunk tree (by looking in the chunk tree?)
– Super blocks at fixed locations contain addresses of chunk and root tree locations
btrfs Features – Copy on Write
• Copy on Write sometimes called COW
• General idea is that many processes can access the same resource
• When one wants to change the resource it makes a copy and changes it
• Other processes still see the unchanged resource
• Used in memory management for delayed allocation, demand paging etc
• Used in NTFS for Volume Shadow Copy
• Can be used for checkpointing
btrfs Copy on Write
• COW provides before images
– Like a journal
• Can be used to undo changes, making btrfs self-healing
• Used to implement file-cloning
– A snapshot of a file
– Clone created by COW
• Can snapshot an entire btrfs volume
– New version created on the fly by COW
btrfs Copy on Write
• Converting from ext in place
• Btrfs mostly doesn't care where its metadata are stored
• Can be put in empty space of an ext fs
• Block pointers point to data blocks created by the ext file system
• COW used to allow changes through btrfs while leaving original ext data intact
• Eventually the btrfs becomes the only copy
btrfs -- Other Features
• Sub volumes
– Part of a btrfs directory tree can be treated as a sub-volume
– Acts like a separately mountable partition contained in the btrfs file system
– Snapshots are implemented as sub-volumes
• Multi-devices
– fs can be created over a pool of multiple devices or partitions
– New devices can be added to expand the fs capacity
btrfs -- RAID
• Muti-device btrfs can use RAID to spread the data over the physical devices
• RAID 0,1,10,5 and 6 are planned
• RAID5 and RAID6 not really ready yet
• More flexible about volumes used in mirror set
• RAID5 and 6 will use more parity devices to provide increased reliability
btrfs -- Send/Receive
• Send creates diff file between a sub-volume and some other volume
– Such as a volume and a snapshot
• Receiving the diff file makes one volume equal the other
• The diff file is essentially an incremental backup
• Can also be used to create and maintain a remote replica
btrfs Reference
• https://wiki.archlinux.org/index.php/Btrfs
• Lots more at the end of the above article
ISO9660
• Filesystem spec for data on CDs (and DVDs)
• 24 byte frames
• 2352 byte sectors (98 frames)
• 2048 user data, 288 bytes ECC and headers and sync data
– A/V disks can use ECC area for data
• Assumed contiguous allocation allows simpler FS structures
ISO Limitations
• For cross platform compatibility:
• File names have upper case letters, digits, underscores and one “.”
• No spaces
• Eight level directories
• 4GiB file size limit
• Most OS ignore or circumvent these limits
• Path table (for efficiency) imposes limit 65,535 on number of directories (not in linux)
ISO Extensions
• Sessions
– ISO is read only, contiguous pre-allocating FS
– Doesn't expect data to be appended
• Sessions allow data to be added to a CD
• Each new session contains an updated copy of the entire disk's directories and other data structures
ISO Extensions
• Joliet
– Unicode names
– Avoids file name restrictions
• Rock Ridge supports POSIX acl's and longer names
• El Torito allows cd's to be bootable
• Apple's ISO 9660 extensions allow for Apple resource forks
UDF
• Universal Disk Format
– Open specification designed for any media
– Mostly used for DVD and BD instead of ISO9660
– Official file system for DVD-Video and Audio per DVD Forum
• Design suitable for incremental updates
– As opposed to creating ISO then burning
• Specification maintained by Optical Storage Technology Association (OSTA)
– Most of the world's optical product manufacturers and resellers
UDF Revisions
• 1.02: DVD video
• 1.50: Introduced VAT for CD-R/DVD-R
• 2.00: File types for DVD recording
• 2.01: Fix bugs in 2.00
• 2.50: Adds metadata partition, Used on some Blu-ray disks
• 2.60: Adds Pseudo OverWrite. Used by some other Blu-ray disks
UDF “Builds”
• Plain (the original one)
– Must be built then burned to CD like ISO9660
• VAT
– Like ISO9660 with lots of little sessions
• Spared
– For -RW media
– Knows to move often changed metadata around to avoid wearing out sectors with re-writes
Compatibility
• Windows calls UDF “Live File System”
– Because nothing should be referred to by its real name
• Vista and later support all features
• Linux support is evolving
– Read for all versions
– Write is “safe” up to 2.01 for plain build
Terminology
• Sector
– Hard drives
– At interface between disk and controller
– Header, data and ECC
– Header contains sync bytes, address identification, flaw flag and header parity bytes
– Usually 512 bytes for data
– Can be 2048 or 4096 (advanced format sectors)
– Sometimes interleaved
Terminology
• Track
– One ring of sectors around a disk
– Can be read by one read head without moving the head
– Invisible to FS and application
• Cylinder
– On multi-platter disk, one head per platter, all move in unison
– Cylinder = the tracks read by the heads without moving the heads
Terminology
• Block
– Sometimes called physical record
– Consists of multiple logical records• On tape media where there is no sector
– Or multiple sectors• On disk
– Transfer unit from device to file system
– Application can specify block size to fine tune performance
– Block size = track size can make sense
Terminology
• Record
– = logical record
– Meaningful to applications
– Internal structure of files• File is usually a collection of logical records
– Not relevant to FS or OS• Except in files OS cares about, like directories,
executables, MFT
– Physical record usually means block
– Sometimes sector is called physical record
Terminology
• Stream
– The non-metadata part of a file
– Actual data is a string of bytes• Byte 0 to byte filesize – 1
– This part of file sometimes called the data stream• NTFS allows alternate data streams
– Some of the metadata are called streams– In MFT name value pairs, value is called stream
Terminology
• Cluster
– Allocation unit
– Sometimes called block
– In ext FS allocation units are called blocks• Ext4 and btrfs allocation unit called extent
Terminology
• Partition
– A subdivision of the space on a disk
– Contiguous, may need to start on physical boundaries
– FSs are contained in partitions
– Often called volumes
– Partitions are the “mounting unit”
– Note C: is where a FS in a partition or volume is mounted
– But C:, D: etc are often called drives, or disks