Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 2 times |
Chapter 12.3 Mass-Storage SystemsChapter 12.3 Mass-Storage Systems
.2/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Chapter 12-3 Mass-Storage SystemsChapter 12-3 Mass-Storage Systems
Chapter 12-1: Overview of Mass Storage Structure
Chapter 12-2: Disk Attachment Disk Scheduling
Chapter 12-3: Disk Management Swap-Space Management RAID Structure
Chapter 12-4 Stable-Storage Implementation Tertiary Storage Devices Operating System Issues Performance Issues
.3/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk ManagementDisk Management
.4/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk ManagementDisk Management
We will discuss three very important topics:
Disk Formatting
Boot Blocks, and
Bad Blocks
.5/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk Management – Preliminary Comments
Before a disk is first put into use, it must be formatted.
But the disk can (and does, typically) support a variety of often diverse uses – from
Operating system needs and
User needs, and
Certain specialized needs for sometimes special applications.
So, the disk must / can be formatted in a number of ways – and is.
Further, when disks are manufactured and sent out for use, they often have bad spots (bad sectors).
This is the norm
Thus any kind of formatting must account for bad spots on the disk and map logical blocks into physical sectors.
So, there’s a lot of important information in this section.
.6/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk Management – Disk Formatting Initially a disk is divided into sectors that the disk controller can read
from and write to.
Recall: the disk controller is itself a small, very specialized processor and executes a restricted instruction set.
The instruction set deals primarily with instructions dealing with I/O and instructions dealing with device operations themselves.
Instructions include requests such as: input and output (open, close, read, write, etc. and additional instructions needed to control and manage the disk operations (is device ready; timing; much more)
As stated in previous lectures, instructions to the disk controller and other low level privileged instructions take the form of ‘commands’ and ‘instructions’ (depends on the computing system) Commands handled interpretively; Instructions – generally assembler level – are a bit different.
.7/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk Management – Disk Sectors Formatting the disk into sectors is referred to as low-level formatting.
Typical sector size is 512 bytes; others are available, such as 256 bytes, 1024 bytes (IK) and others. 512 is the norm.
Each sector itself contains a specific data structure consisting of a header, body of the sector, and a trailer.
Headers and Trailers are used for control information needed by the disk controller. These typically include:
Sector number (can check against the request for I/O) - Usually found in header.
Error Correction Codes (ECC) - Usually found in trailer
Data area itself – generally 512 bytes.
.8/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk Management – Disk Sectors The error correcting code (not exactly the same as those used in main
memory – but the thinking and specific bits are very similar…) is generated when data is written to the data portion. There are a variety of formulas used in generating ECCs.
When a read takes place, the hardware calculates the code based on the number and position of specific bitsand compares it to the code stored in the sector.
If different, the sector is somehow corrupted
Because the ECC is error-correcting, the bit (hopefully one) may be both identified (detected) and corrected. This is called a soft error and this phenomenon is passed on to
system administrators and tech reps for maintenance concerns. We talk a lot about ECC, parity, redundancy and more when we
discuss RAID ahead..
.9/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk Management – Low-Level Formatting
Usually the disk is low-level formatted as part of production.
Part of this formatting includes a process to designate a number of bytes of the data portion of a sector.
Header and trailer sizes are generally fixed because they are hardware processed.
As stated, the disk may be ‘divided’ into specific portions of the disk (called partitions) which may be used for specific needs.
Because of the physical characteristics of disk access, partitions are normally allocated ‘in cylinders.’
Each partition is essentially a separate logical disk.
Three typical partitions are
Partition for the operating system’s executable code
Partitions for user files
Often, partition(s) of raw disk.
.10/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk Management – Low-Level Formatting
Given that three (or more) partitions are established, these partitions need to be made ready.
After partitioning, step 2 is logical formatting (creation of a file system).
Here, the OS needs to establish several data structures for control and management.
These partitions are initialized and include structures such as
empty directories and
other structures (like memory maps)
to be used to manage free and allocated regions necessary during normal operations.
In truth, there is much more than just these…
.11/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Really – much more is done depending on the OS
Are we using virtual memory?
Many more support structures are needed. –
paging? Segmentation? Memory maps
Setting up queues to support multi-tasking operations
Other skeletal data structures to be used during operations…
Perhaps we want to be able to dual boot this computing system…
Another side note: Interestingly, actual disk I/O is done in blocks, but file system I/O is done in clusters, simply larger hunks of blocks.
This is done to facilitate sequential I/O exploiting the theory of locality. (Recall?)
.12/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk Management – Low-Level Formatting- Raw Disk
The notion of raw disk is an important one. This partition (normally not very large) has no associated file system
included in its initialization. It is simply a ‘raw’ area of sequential blocks.
Actually, processing in raw disk can speed up many operations, but special processing is required and is the responsibility of such clients.
Very specific locations in the partition can be exactly specified and hence the need to use a file system (directory, etc.) is bypassed.
In fact, using a file system would be a major hindrance and would likely slow things down considerably!
Raw disk is simply available to some special clients with some special applications to use as they wish.
These clients (for example, data base engines) are on their own and do not enjoy benefits of buffering, cache, pre-fetching, etc.
But, these needs are often found in the real world, and thus there is often a disk partition for raw I/O.
.13/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Disk Management – That Boot Block
We know that we typically have a bootstrap loader located in ROM such that when power is supplied, this burned in code is executed.
In truth, this bootstrap loader is a very simply program which normally is used to really bring in the full bootstrap loader from disk generally located in a fixed location on disk.
The bootstrap loader really initiates a call to a reserved area of the OS’s partition which will then undertake full bootstrap operations.
Since the bootstrap loader in ROM is read only, we don’t have to worry about this becoming corrupted.
Once the full bootstrap program is brought in (note there is no device drivers built yet), this loader then proceeds to load and initialize the rest of the operating system.
.14/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Windows 200 Boot Approach
Here’s a system specific approach:
First of all, code is run in the system’s ROM. This directs control to a fixed address which contains the boot code (as we implied).
This fixed address is the first sector of the hard disk, which is called the Master Boot Record.
The MBR contains a pointer to the real boot code to be executed and a table that lists the partitions and their locations on this disk.
This table in the Master Boot Record is thus a map of the partitions of the disk but contains this pointer one of the partitions called: boot partition..
The boot partition contains the operating system and all the device drivers.
Control is transferred to the boot partition (boot sector) which is executed which causes the continued loading and initializing of the rest of the operating system and its key support algorithms and data structures.
.15/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Booting from a Disk in Windows 2000Booting from a Disk in Windows 2000
First sector of disk is the Master Boot Record.
Boot partition (sector) contains OS and device drivers
Code in bootstrap loader in ROMdirects system to read boot code from MBR. This record contains table listing of partitions of hard disk and a pointer to boot sector – where remainder of OS is to be booted from
.16/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Bad Blocks – IDE Controllers
As stated, disks are normally manufactured with back blocks on them. After low-level formatting – normally done at the factory – they are sent
out with bad blocks on them. How bad spots on disk are accommodated is the subject here.
For small systems with IDE controllers, when one formats the disk, the scan finds these bad blocks.
This approach is simple, and an appropriate entry is entered into the FAT citing that this block is not available for assignment.
Once the disk is in operation, a common system program, chkdsk, is run which indentifies bad blocks and also indicates their defecti on the FAT. Unfortunately, if this happens, the data in this block is normally lost.
.17/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Bad Blocks – SCSI Controllers SCSI disks are usually found on servers, workstations, and high-end PCs. As before, this list is determined at the factory. But during operations, the disk is automatically updated.
Part of the initialization of the disk process is to reserve some spare sectors, which can be used to replace bad sectors – substituting a spare for a bad one.
Called: sector sparing or forwarding.
Bad sectors can arise from almost anything. A request to read from the bad block is simply mapped to the spare. Unfortunately, substitution brings with it problems, as one could imagine.
.18/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Bad Blocks – SCSI Controllers – more Continuing: bad sectors in a SCSI:
Unfortunately, we are not certain where these spares may be located.
It might bring some real sub-optimization of the disk scheduling algorithm.
So, most disks are formatted to have some spare sectors within the same cylinder.
An alternative to sector sparing is sector slipping.
As its name implies, once a bad sector is identified, block contents are passed to the ‘next’ block address up to and including the first spare block. (that is, all are ‘moved up’ one).
Of course, we have a performance drop off here as data is moved…
And, almost always, data in a bad block is gone.
Although there are techniques such that an identified bad block can be copied and perhaps spared, in most instances the data is gone!
These are called soft errors. We normally ‘press on.’
Hard errors, in contrast, typically represent really lost data and necessitate some kind of restore from a back up media.
This requires, usually, a computer operator intervention to load (if not loaded) the back up – followed by invocation of a restore procedure.
.19/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Swap Space Management
.20/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Swap-Space Management
Introductory thoughts
Swap Space Use
Swap Space Location
Swap Space in Solaris
.21/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Swap Space Management - Introduction
Introductory Thoughts
Swapping is used to make room in memory generally when we are space constrained.
In truth, this is rarely used nowadays.
Rather than swap entire processes, which can be very expensive from a performance perspective, we normally swap pages – a virtual memory technique.
So, we sometimes use the terms swapping and paging somewhat interchangeably nowadays.
Idea is to use low-level OS functions in a virtual environment and use disk space as extended primary memory.
Any time swapping is undertaken, there is a significant degradation in overall performance in comparing disk access to memory addressing, but remember, the reason for swapping overall is, of course, to improve overall performance in a virtual memory system.
.22/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Swap Space Use Naturally, the use of swap space will depend upon the operating system.
If an entire process is swapped, more swap space will be needed than in paged systems, where only a page may need to be stored.
Too, when an entire process is swapped, we are usually not only talking about the code, but data areas (stacks, etc.) as well.
So, how much space do we really need on disk?
Can range from a few meg to gigs!
Overestimate needed swap space?
May waste space – but often no additional harm done.
Underestimate needed swap space?
May result in the process being aborted!
Some systems (Linux) suggest an amount of swap space:
“Double the amount of physical memory”
Most set aside less; some argue whether swap space should be set aside at all.
Some systems have multiple swap spaces on different disks so that the space is distributed over the system’s I/O devices.
.23/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Swap Space Location Found in: Normal file system; sometimes in
A separate disk partition – a raw partition. File System Swap Space: So, think about using a file system…
File systems incur a lot of overhead Get some nice features using a file management system, but when time is
very critical, this approach can introduce many inefficiences. We’d have to use a directory lookups and perhaps extra disk accesses. These are (in this context) not desirable. Called traversing the file system.
Raw partition as Swap Space. Uses no file system (there’s no file system / directory structure in this approach) We do have a swap space manager which must be used to manage the blocks
in this partition, whose space is determined during initial disk partitioning. This manager uses special algorithms for speed vice space efficiency.. Internal Fragmentation? At reboot time, any internal fragmentation will go away.
This is nice. The data in the swap space don’t stay there long, as it might in typical disk
partitions that use a file system. Linux allows swap space in both raw partitions and in the file system. The trade offs are clearly between swap performance – time, and the convenience of
management of the file system.
.24/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
RAID
.25/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Redundant Arrays of Independent Disks (RAID)
Raid architecture is a very interesting modern approach to significantly improve the way we read and write data to disks.
In RAID, we are exploiting the notion of parallelism in disks.
Parallelism via RAID can significantly improve performance – speed of I/O – and reliability – back up, recovery, and redundancy in stored data..
In the pasts, RAIDs were inexpensive disks and considered an inexpensive alternative to large, very expensive disks.
Now, these are considered Redundant Arrays of Independent Disks.
The reality is that disks do fail.
One approach is to have redundancy – the storage of information that we do not expect to use, but in emergencies, it can be used to backup and restore the lost information.
.26/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Redundant Arrays of Inexpensive Disks (RAID) We must consider engineering metrics such as Mean Time Between Failures
(MTBF) as well as Mean Times to Repair (MTTR). We first consider the technique of mirroring, where two physical disks is
considered one logical disk. Every write operation is accomplished on both disks Appears to provide good security - assuming that the second disk will
not fail before the first failed disk can be repaired. Unfortunately, independence of disk failures cannot be assumed. Often power failures / surges, earthquakes, other disasters may wipe out
more than one disk at the same time. Another approach is to have a second disk but to stagger the second write
operation a short time interval after the first. Operation is not ‘complete’ until second ‘write’ is successful.
Still another approach is to add something called a nonvolatile RAM (NVRAM) cache to the RAID array.
This write-back cache is protected from data loss during power failures (often the big culprit in failures). So here, the write can be considered complete once the first write takes place.
.27/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts
Improvement in Performance via Parallelism Improvement in Performance via Parallelism Several improvements in disk-use techniques involve the use of multiple disks
working cooperatively.
Disk striping can be used to significantly improve the transfer rate. . Data striping means to split the bits of each byte across multiple disks.
An eight-bit byte would need eight disks. In this architecture, an array of eight disks is considered one logical disk. Note: we, operating in parallel, have eight times the access rate. Each disk participates in each access, so the number of accesses that can be
processed per second is about the same as on a single disk, but each access can read eight times as much data in the same time as from a single disk!
Of course, the principle of bit-striping generalizes to block-level striping.
Overall goals of striping are two: Increase the throughput of multiple small accesses (that is, page accesses) by
load balancing, and Reduce the response time of large accesses.
End of Chapter 12.3End of Chapter 12.3