+ All Categories
Home > Documents > BC0042-OS

BC0042-OS

Date post: 08-Nov-2014
Category:
Upload: raj-chowdhury
View: 32 times
Download: 1 times
Share this document with a friend
Description:
BC0042-os
Popular Tags:
20
BC0042 – Operating Systems Assignment Set – 1 1. Following are the five services provided by operating systems for the convenience of the users. Program Execution The purpose of a computer system is to allow the user to execute programs. So the operating system provides an environment where the user can conveniently run programs. The user does not have to worry about the memory allocation or multitasking or anything. These things are taken care of by the operating systems. Running a program involves the allocating and de-allocating memory, CPU scheduling in case of multi-process. These functions cannot be given to the user- level programs. So user-level programs cannot help the user to run programs independently without the help from operating systems. I/O Operations Each program requires an input and produces output. This involves the use of I/O. The operating systems hide from the user the details of underlying hardware for the I/O. All the users see that the I/O has been performed without any details. So the operating system, by providing I/O, makes it convenient for the users to run programs. For efficiently and protection users cannot control I/O so this service cannot be provided by user-level programs. File System Manipulation The output of a program may need to be written into new files or input taken from some files. The operating system provides this service. The user does not have to worry about secondary storage management. User gives a command for reading or writing to a file and sees his/her task accomplished. Thus operating system makes it easier for user programs to accomplish their task. This service involves secondary storage management. The speed of I/O that depends on secondary storage management is critical to the speed of many programs and hence I think it is best relegated to the operating systems to manage it than giving individual users the control of it. It is not difficult for the user-level programs to provide these services but for above mentioned reasons it is best if this service is left with operating system. Communications
Transcript
Page 1: BC0042-OS

BC0042 – Operating Systems

Assignment Set – 1

1. Following are the five services provided by operating systems for the convenience of the users.

Program ExecutionThe purpose of a computer system is to allow the user to execute programs. So the operating system provides an environment where the user can conveniently run programs. The user does not have to worry about the memory allocation or multitasking or anything. These things are taken care of by the operating systems. Running a program involves the allocating and de-allocating memory, CPU scheduling in case of multi-process. These functions cannot be given to the user-level programs. So user-level programs cannot help the user to run programs independently without the help from operating systems.I/O OperationsEach program requires an input and produces output. This involves the use of I/O. The operating systems hide from the user the details of underlying hardware for the I/O. All the users see that the I/O has been performed without any details. So the operating system, by providing I/O, makes it convenient for the users to run programs. For efficiently and protection users cannot control I/O so this service cannot be provided by user-level programs.File System ManipulationThe output of a program may need to be written into new files or input taken from some files. The operating system provides this service. The user does not have to worry about secondary storage management. User gives a command for reading or writing to a file and sees his/her task accomplished. Thus operating system makes it easier for user programs to accomplish their task. This service involves secondary storage management. The speed of I/O that depends on secondary storage management is critical to the speed of many programs and hence I think it is best relegated to the operating systems to manage it than giving individual users the control of it. It is not difficult for the user-level programs to provide these services but for above mentioned reasons it is best if this service is left with operating system.

CommunicationsThere are instances where processes need to communicate with each other to exchange information. It may be between processes running on the same computer or running on the different computers. By providing this service the operating system relieves the user from the worry of passing messages between processes. In case where the messages need to be passed to processes on the other computers through a network, it can be done by the user programs. The user program may be customized to the specifications of the hardware through which the message transits and provides the service interface to the operating system.

Error DetectionAn error in one part of the system may cause malfunctioning of the complete system. To avoid such a situation the operating system constantly monitors the system for detecting the errors. This relieves the user from the worry of errors propagating to various part of the system and causing malfunctioning. This service cannot be allowed to be handled by user programs because it involves monitoring and in cases altering area of memory or de-allocation of memory for a faulty process, or may be relinquishing the CPU of a process that goes into an infinite loop. These tasks are too critical to be handed over to the user programs. A user program if given these privileges can interfere with the correct (normal) operation of the operating systems.

Page 2: BC0042-OS

2. Here is the description about operating system Level Virtualization.

Operating System-level Virtualization is a server virtualization technology which virtualizes servers on an operating system (kernel) layer. It can be thought of as partitioning: a single physical server is sliced into multiple small partitions (otherwise called virtual environments (VE), virtual private servers (VPS), guests, zones etc); each such partition looks and feels like a real server, from the point of view of its users.

The operating system level architecture has low overhead that helps to maximize efficient use of server resources. The virtualization introduces only a negligible overhead and allows running hundreds of virtual private servers on a single physical server. In contrast, approaches such as virtualisation (like VMware) and paravirtualization (like Xen or UML) cannot achieve such level of density, due to overhead of running multiple kernels. From the other side, operating system-level virtualization does not allow running different operating systems (i.e. different kernels), although different libraries, distributions etc. are possible

3. Mutual exclusion is a way of making sure that if one process is using a shared modifiable data, the other processes will be excluded from doing the same thing.

That is, while one process executes the shared variable, all other processes desiring to do so at the same time moment should be kept waiting; when that process has finished using the shared variable, one of the processes waiting to do so should be allowed to proceed. In this fashion, each process using the shared data (variables) excludes all others from doing so simultaneously. This is called Mutual Exclusion.

Mutual exclusion needs to be enforced only when processes access shared modifiable data - when processes are performing operations that do not conflict with one another they should be allowed to proceed concurrently.

Requirements for mutual exclusionFollowing are the six requirements for mutual exclusion.

Mutual exclusion must be enforced: Only one process at a time is allowed into its critical section, among all processes that have critical sections for the same resource or shared object.

A process that halts in its non critical section must do so without interfering with other processes.

It must not be possible for a process requiring access to a critical section to be delayed indefinitely.

When no process is in a critical section, any process that requests entry to its critical section must be permitted to enter without delay.

No assumptions are made about relative process speed or number of processors. A process remains inside its critical section for a finite time only.

Page 3: BC0042-OS

4. Following are the states of a five state process model. The figure 3.1 shows these state transitions.

New State: The process being created. Terminated State: The process has finished execution. Blocked (waiting) State: When a process blocks, it does so because logically it cannot

continue, typically because it is waiting for input that is not yet available. Formally, a process is said to be blocked if it is waiting for some event to happen (such as an I/O completion) before it can proceed. In this state a process is unable to run until some external event happens.

Running State: A process is said to be running if it currently has the CPU, which is, actually using the CPU at that particular instant.

Ready State: A process is said to be ready if it use a CPU if one were available. It is run-able but temporarily stopped to let another process run.

5. The fully associate mapping cache gives the greatest flexibility of holding combinations of blocks in the cache and minimum conflict for a given sized cache, but is also the most expensive, due to the cost of the associative memory. It requires a replacement algorithm to select a block to remove upon a miss and the algorithm must be implemented in hardware to maintain a high speed of operation. The fully associative cache can only be formed economically with a moderate size capacity. Microprocessors with small internal caches often employ the fully associative mechanism.

Page 4: BC0042-OS

Direct mappingThe fully associative cache is expensive to implement because of requiring a comparator with each cache location, effectively a special type of memory. In direct mapping, the cache consists of normal high speed random access memory, and each location in the cache holds the data, at an address in the cache given by the lower significant bits of the main memory address. This enables the block to be selected directly from the lower significant bits of the memory address. The remaining higher significant bits of the address are stored in the cache with the data to complete the identification of the cached data.

The address from the processor is divided into tow fields, a tag and an index. The tag consists of the higher significant bits of the address, which are stored with the data. The index is the lower significant bits of the address used to address the cache.

When the memory is referenced, the index is first used to access a word in the cache. Then the tag stored in the accessed word is read and compared with the tag in the address. If the two tags are the same, indicating that the word is the one required, access is made to the addressed cache word. However, if the tags are not the same, indicating that the required word is not in the cache, reference is made to the main memory to find it. For a memory read operation, the word is then transferred into the cache where it is accessed. It is possible to pass the information to the cache and the processor simultaneously, i.e., to read-through the cache, on a miss. The cache location is altered for a write operation. The main memory may be altered at the same time (write-through) or later.

The main memory address is composed of a tag, an index, and a word within a line. All the words within a line in the cache have the same stored tag. The index part to the address is used to access the cache and the stored tag is compared with required tag address. For a read operation, if the tags are the same the word within the block is selected for transfer to the processor. If the tags are not the same, the block containing the required word is first transferred to the cache.

In direct mapping, the corresponding blocks with the same index in the main memory will map into the same block in the cache, and hence only blocks with different indices can be in the cache at the same time. A replacement algorithm is unnecessary, since there is only one allowable location for each incoming block. Efficient replacement relies on the low probability of lines with the same index being required. However there are such occurrences, for example, when two data vectors are stored starting at the same index and pairs of elements need to processed together. To gain the greatest performance, data arrays and vectors need to be stored in a manner which minimizes the conflicts in processing pairs of elements. The lower bits of the processor address used to address the cache location directly. It is possible to introduce a mapping function between the address index and the cache index so that they are not the same.

Set-associative mappingIn the direct scheme, all words stored in the cache must have different indices. The tags may be the same or different. In the fully associative scheme, blocks can displace any other block and can be placed anywhere, but the cost of the fully associative memories operate relatively slowly.

Set-associative mapping allows a limited number of blocks, with the same index and different tags, in the cache and can therefore be considered as a compromise between a fully associative cache and a direct mapped cache. The cache is divided into "sets" of blocks. A

Page 5: BC0042-OS

four-way set associative cache would have four blocks in each set. The number of blocks in a set is know as the associatively or set size. Each block in each set has a stored tag which, together with the index, completes the identification of the block. First, the index of the address from the processor is used to access the set. Then, comparators are used to compare all tags of the selected set with the incoming tag. If a match is found, the corresponding location is accessed, other wise, as before, an access to the main memory is made.

The tag address bits are always chosen to be the most significant bits of the full address, the block address bits are the next significant bits and the word/byte address bits form the least significant bits as this spreads out consecutive man memory blocks throughout consecutive sets in the cache. This addressing format is known as bit selection and is used by all known systems. In a set-associative cache it would be possible to have the set address bits as the most significant bits of the address and the block address bits as the next significant, with the word within the block as the least significant bits, or with the block address bits as the least significant bits and the word within the block as the middle bits.

Notice that the association between the stored tags and the incoming tag is done using comparators and can be shared for each associative search, and all the information, tags and data, can be stored in ordinary random access memory. The number of comparators required in the set-associative cache is given by the number of blocks in a set, not the number of blocks in all, as in a fully associative memory. The set can be selected quickly and all the blocks of the set can be read out simultaneously with the tags before waiting for the tag comparisons to be made. After a tag has been identified, the corresponding block can be selected.

The replacement algorithm for set-associative mapping need only consider the lines in one set, as the choice of set is predetermined by the index in the address. Hence, with two blocks in each set, for example, only one additional bit is necessary in each set to identify the block to replace.

Sector MappingIn sector mapping, the main memory and the cache are both divided into sectors; each sector is composed of a number of blocks. Any sector in the main memory can map into any sector in the cache and a tag is stored with each sector in the cache to identify the main memory sector address. However, a complete sector is not transferred to the cache or back to the main memory as one unit. Instead, individual blocks are transferred as required. On cache sector miss, the required block of the sector is transferred into a specific location within one sector. The sector location in the cache is selected and all the other existing blocks in the sector in the cache are from a previous sector.

Sector mapping might be regarded as a fully associative mapping scheme with valid bits, as in some microprocessor caches. Each block in the fully associative mapped cache corresponds to a sector, and each byte corresponds to a "sector block".

6. Here is the short notes on Context Switching

To give each process on a multiprogrammed machine a fair share of the CPU, a hardware clock generates interrupts periodically. This allows the operating system to schedule all processes in main memory (using scheduling algorithm) to run on the CPU at equal intervals. Each time a clock interrupt occurs, the interrupt handler checks how much time the current running process has used. If it has used up its entire time slice, then the CPU scheduling

Page 6: BC0042-OS

algorithm (in kernel) picks a different process to run. Each switch of the CPU from one process to another is called a context switch.

A context is the contents of a CPU's registers and program counter at any point in time. Context switching can be described as the kernel (i.e., the core of the operating system) performing the following activities with regard to processes on the CPU: (1) suspending the progression of one process and storing the CPU's state (i.e., the context) for that process somewhere in memory, (2) retrieving the context of the next process from memory and restoring it in the CPU's registers and (3) returning to the location indicated by the program counter (i.e., returning to the line of code at which the process was interrupted) in order to resume the process. The figure 3.5 bellow depicts the process of context switch from process P0 to process P1.

Page 7: BC0042-OS

Assignment Set – 2

7. I/O Structure and its working Principle

Figure-1 shows the general I/O structure associated with many medium-scale processors. Note that the I/O controllers and main memory are connected to the main system bus. The cache memory (usually found on-chip with the CPU) has a direct connection to the processor, as well as to the system bus.

Figure 1: A general I/O structure for a medium-scale processor system

Note that the I/O devices shown here are not connected directly to the system bus, they interface with another device called an I/O controller. In simpler systems, the CPU may also serve as the I/O controller, but in systems where throughput and performance are important, I/O operations are generally handled outside the processor.

Until relatively recently, the I/O performance of a system was somewhat of an afterthought for systems designers. The reduced cost of high-performance disks, permitting the proliferation of virtual memory systems, and the dramatic reduction in the cost of high-quality video display devices, have meant that designers must pay much more attention to this aspect to ensure adequate performance in the overall system.

Because of the different speeds and data requirements of I/O devices, different I/O strategies may be useful, depending on the type of I/O device which is connected to the computer.

Page 8: BC0042-OS

Because the I/O devices are not synchronized with the CPU, some information must be exchanged between the CPU and the device to ensure that the data is received reliably. This interaction between the CPU and an I/O device is usually referred to as ``handshaking''. For a complete ``handshake,'' four events are important:

The device providing the data (the talker) must indicate that valid data is now available.

The device accepting the data (the listener) must indicate that it has accepted the data. This signal informs the talker that it need not maintain this data word on the data bus any longer.

The talker indicates that the data on the bus is no longer valid, and removes the data from the bus. The talker may then set up new data on the data bus.

The listener indicates that it is not now accepting any data on the data bus. the listener may use data previously accepted during this time, while it is waiting for more data to become valid on the bus.

Note that each of the talker and listener supply two signals. The talker supplies a signal (say, data valid, or DAV) at step (1). It supplies another signal (say, data not valid, or) at step (3). Both these signals can be coded as a single binary value (DAV) which takes the value 1 at step (1) and 0 at step (3). The listener supplies a signal (say, data accepted, or DAC) at step (2). It supplies a signal (say, data not now accepted, or ) at step (4). It, too, can be coded as a single binary variable, DAC. Because only two binary variables are required, the handshaking information can be communicated over two wires, and the form of handshaking described above is called a two wire Handshake. Other forms of handshaking are used in more complex situations; for example, where there may be more than one controller on the bus, or where the communication is among several devices. Figure 2 shows a timing diagram for the signals DAV and DAC which identifies the timing of the four events described previously.

Figure 2: Timing diagram for two-wire handshake

Either the CPU or the I/O device can act as the talker or the listener. In fact, the CPU may act as a talker at one time and a listener at another. For example, when communicating with a terminal screen (an output device) the CPU acts as a talker, but when communicating with a terminal keyboard (an input device) the CPU acts as a listener.

8. Explanation on various File organize method

Just as the process abstraction beautifies the hardware by making a single CPU (or a small number of CPUs) appear to be many CPUs, one per "user," the file system beautifies the hardware disk, making it appear to be a large number of disk-like objects called files. Like a disk, a file is capable of storing a large amount of data cheaply, reliably, and persistently. The fact that there are lots of files is one form of beautification: Each file is individually

Page 9: BC0042-OS

protected, so each user can have his own files, without the expense of requiring each user to buy his own disk. Each user can have lots of files, which makes it easier to organize persistent data. The file system also makes each individual file more beautiful than a real disk. At the very least, it erases block boundaries, so a file can be any length (not just a multiple of the block size) and programs can read and write arbitrary regions of the file without worrying about whether they cross block boundaries. Some systems (not Unix) also provide assistance in organizing the contents of a file.

Systems use the same sort of device (a disk drive) to support both virtual memory and files. The question arises why these have to be distinct facilities, with vastly different user interfaces. The answer is that they don't. In Multics, there was no difference whatsoever. Everything in Multics was a segment. The address space of each running process consisted of a set of segments (each with its own segment number), and the "file system" was simply a set of named segments. To access a segment from the file system, a process would pass its name to a system call that assigned a segment number to it. From then on, the process could read and write the segment simply by executing ordinary loads and stores. For example, if the segment was an array of integers, the program could access the ith number with a notation like a[i] rather than having to seek to the appropriate offset and then execute a read system call. If the block of the file containing this value wasn't in memory, the array access would cause a page fault, which was serviced.

This user-interface idea, sometimes called "single-level store," is a great idea. So why is it not common in current operating systems? In other words, why are virtual memory and files presented as very different kinds of objects? There are possible explanations one might propose:

The address space of a process is small compared to the size of a file system.

There is no reason why this has to be so. In Multics, a process could have up to 256K segments, but each segment was limited to 64K words. Multics allowed for lots of segments because every "file" in the file system was a segment. The upper bound of 64K words per segment was considered large by the standards of the time; The hardware actually allowed segments of up to 256K words (over one megabyte). Most new processors introduced in the last few years allow 64-bit virtual addresses. In a few years, such processors will dominate. So there is no reason why the virtual address space of a process cannot be large enough to include the entire file system.

The virtual memory of a process is transient – it goes away when the process terminates – while files must be persistent.

Multics showed that this doesn't have to be true. A segment can be designated as "permanent," meaning that it should be preserved after the process that created it terminates. Permanent segments to raise a need for one "file-system-like" facility, the ability to give names to segments so that new processes can find them.

Files are shared by multiple processes, while the virtual address space of a process is associated with only that process.

Most modern operating systems (including most variants of Unix) provide some way for processes to share portions of their address spaces anyhow, so this is a particularly weak argument for a distinction between files and segments.

Page 10: BC0042-OS

The real reason single-level store is not ubiquitous is probably a concern for efficiency. The usual file-system interface encourages a particular style of access: Open a file, go through it sequentially, copying big chunks of it to or from main memory, and then close it. While it is possible to access a file like an array of bytes, jumping around and accessing the data in tiny pieces, it is awkward. Operating system designers have found ways to implement files that make the common "file like" style of access very efficient. While there appears to be no reason in principle why memory-mapped files cannot be made to give similar performance when they are accessed in this way, in practice, the added functionality of mapped files always seems to pay a price in performance. Besides, if it is easy to jump around in a file, applications programmers will take advantage of it, overall performance will suffer, and the file system will be blamed.

9. I/O Control Strategies and its detail description

Several I/O strategies are used between the computer system and I/O devices, depending on the relative speeds of the computer system and the I/O devices. The simplest strategy is to use the processor itself as the I/O controller, and to require that the device follow a strict order of events under direct program control, with the processor waiting for the I/O device at each step.

Another strategy is to allow the processor to be ``interrupted'' by the I/O devices, and to have a (possibly different) ``interrupt handling routine'' for each device. This allows for more flexible scheduling of I/O events, as well as more efficient use of the processor. (Interrupt handling is an important component of the operating system.)

A third general I/O strategy is to allow the I/O device, or the controller for the device, access to the main memory. The device would write a block of information in main memory, without intervention from the CPU, and then inform the CPU in some way that that block of memory had been overwritten or read. This might be done by leaving a message in memory, or by interrupting the processor. (This is generally the I/O strategy used by the highest speed devices – hard disks and the video controller.)

Program-controlled I/O

One common I/O strategy is program-controlled I/O, (often called polled I/O). Here all I/O is performed under control of an ``I/O handling procedure,'' and input or output is initiated by this procedure.

The I/O handling procedure will require some status information (handshaking information) from the I/O device (e.g., whether the device is ready to receive data). This information is usually obtained through a second input from the device; a single bit is usually sufficient, so one input ``port'' can be used to collect status, or handshake, information from several I/O devices. (A port is the name given to a connection to an I/O device; e.g., to the memory location into which an I/O device is mapped). An I/O port is usually implemented as a register (possibly a set of D flip flops) which also acts as a buffer between the CPU and the actual I/O device. The word port is often used to refer to the buffer itself.Typically, there will be several I/O devices connected to the processor; the processor checks the ``status'' input port periodically, under program control by the I/O handling procedure. If an I/O device requires service, it will signal this need by altering its input to the ``status'' port. When the I/O control program detects that this has occurred (by reading the status port) then the appropriate operation will be performed on the I/O device which requested the service. A

Page 11: BC0042-OS

typical configuration might look. The outputs labeled ``handshake in'' would be connected to bits in the ``status'' port. The input labeled ``handshake in'' would typically be generated by the appropriate decode logic when the I/O port corresponding to the device was addressed.

Program-controlled I/O has a number of advantages:

All control is directly under the control of the program, so changes can be readily implemented.

The order in which devices are serviced is determined by the program, this order is not necessarily fixed but can be altered by the program, as necessary. This means that the ``priority'' of a device can be varied under program control. (The ``priority'' of a determines which of a set of devices which are simultaneously ready for servicing will actually be serviced first).

It is relatively easy to add or delete devices.

Perhaps the chief disadvantage of program-controlled I/O is that a great deal of time may be spent testing the status inputs of the I/O devices, when the devices do not need servicing. This ``busy wait'' or ``wait loop'' during which the I/O devices are polled but no I/O operations are performed is really time wasted by the processor, if there is other work which could be done at that time. Also, if a particular device has its data available for only a short time, the data may be missed because the input was not tested at the appropriate time.

Program controlled I/O is often used for simple operations which must be performed sequentially. For example, the following may be used to control the temperature in a room:

DO forever INPUT temperature IF (temperature < setpoint) THEN

turn heat ON

ELSE

turn heat OFF

END IF

Note here that the order of events is fixed in time, and that the program loops forever. (It is really waiting for a change in the temperature, but it is a ``busy wait.'')

Page 12: BC0042-OS

10. Deadlock situation and its graphical explanation

A deadlock is a situation in which two computer programs sharing the same resource are effectively preventing each other from accessing the resource, resulting in both programs ceasing to function.

All of the resources of the system were available to this one program.

In order for deadlock to occur, four conditions must be true.

Mutual exclusion – Each resource is either currently allocated to exactly one process or it is available. (Two processes cannot simultaneously control the same resource or be in their critical section).

Hold and Wait – processes currently holding resources can request new resources No preemption – Once a process holds a resource, it cannot be taken away by another

process or the kernel. Circular wait – Each process is waiting to obtain a resource which is held by another

process.

11. Explanation on Win NT architecture

The Windows NT operating system family's architecture consists of two layers (user mode and kernel mode), with many different modules within both of these layers.

User mode in the Windows NT line is made of subsystems capable of passing I/O requests to the appropriate kernel mode software drivers by using the I/O manager. Two subsystems make up the user mode layer of Windows 2000: the Environment subsystem (runs applications written for many different types of operating systems), and the Integral subsystem (operates system specific functions on behalf of the environment subsystem). Kernel mode in Windows 2000 has full access to the hardware and system resources of the computer. The kernel mode stops user mode services and applications from accessing critical areas of the operating system that they should not have access to.

The Executive interfaces with all the user mode subsystems. It deals with I/O, object management, security and process management. The hybrid kernel sits between the Hardware Abstraction Layer and the Executive to provide multiprocessor synchronization, thread and interrupt scheduling and dispatching, and trap handling and exception dispatching. The microkernel is also responsible for initializing device drivers at bootup. Kernel mode drivers exist in three levels: highest level drivers, intermediate drivers and low level drivers. Windows Driver Model (WDM) exists in the intermediate layer and was mainly designed to be binary and source compatible between Windows 98 and Windows 2000. The lowest level drivers are either legacy Windows NT device drivers that control a device directly or can be a PnP hardware bus.

12. File System and its attributes in brief

NamingEvery file system provides some way to give a name to each file. We will consider only names for individual files here, and talk about directories later. The name of a file is (at least sometimes) meant to used by human beings, so it should be easy for humans to use. Different operating systems put different restrictions on names:

Page 13: BC0042-OS

SizeSome systems put severe restrictions on the length of names. For example DOS restricts names to 11 characters, while early versions of Unix (and some still in use today) restrict names to 14 characters. The Macintosh operating system, Windows 95, and most modern version of Unix allow names to be essentially arbitrarily long. I say "essentially" since names are meant to be used by humans, so they don't really to to be all that long. A name that is 100 characters long is just as difficult to use as one that it forced to be under 11 characters long (but for different reasons). Most modern versions of Unix, for example, restrict names to a limit of 255 characters.

CaseAre upper and lower case letters considered different? The Unix tradition is to consider the names FILE1 and file1 to be completely different and unrelated names. In DOS and its descendants, however, they are considered the same. Some systems translate names to one case (usually upper case) for storage. Others retain the original case, but consider it simply a matter of decoration. For example, if you create a file named "FILE1," you could open it as "file1" or "FIL," but if you list the directory, you would still see the file listed as "Fil".

Character SetDifferent systems put different restrictions on what characters can appear in file names. The Unix directory structure supports names containing any character other than NUL (the byte consisting of all zero bits), but many utility programs (such as the shell) would have troubles with names that have spaces, control characters or certain punctuation characters (particularly '/'). MacOS allows all of these (e.g., it is not uncommon to see a file name with the Copyright symbol © in it). With the world-wide spread of computer technology, it is becoming increasingly important to support languages other than English, and in fact alphabets other than Latin. There is a move to support character strings (and in particular file names) in the Unicode character set, which devotes 16 bits to each character rather than 8 and can represent the alphabets of all major modern languages from Arabic to Devanagari to Telugu to Khmer.

FormatIt is common to divide a file name into a base name and an extension that indicates the type of the file. DOS requires that each name be compose of a bast name of eight or less characters and an extension of three or less characters. When the name is displayed, it is represented as base extension. Unix internally makes no such distinction, but it is a common convention to include exactly one period in a file name (e.g. fil.c for a C source file).


Recommended