UNIT 1 ca

UNIT 1OVERVIEW & INSTRUCTIONSSyllabus:Eight ideas – Components of a computer system – Technology – Performance – Power wall – Uniprocessors to multiprocessors; Instructions – operations and operands – representing instructions – Logical operations – control operations – Addressing and addressing modes.

1.1 EIGHT IDEAS:IDEA 1: Design for Moore's Law

The one constant for computer designers is rapid change, which is driven largely by Moore's Law.

It states that integrated circuit resources double every 18–24 months.Moore's Law resulted from a 1965 prediction of such growth in IC capacity made

by Gordon Moore, one of the founders of Intel. As computer designs can take years, the resources available per chip can easily

double or quadruple between the start and finish of the project. We use an "up and to the right" Moore's Law graph to represent designing for rapid change.

IDEA 2: Use Abstraction to Simplify DesignBoth computer architects and programmers had to invent techniques to make themselves

more productive, for otherwise design time would lengthen as dramatically as resources grew by Moore's Law. A major productivity technique for hardware and soft ware is to use abstractions to represent the design at different levels of representation; lower-level details are hidden to offer a simpler model at higher levels.

IDEA 3: Make the common case fastMaking the common case fast will tend to enhance performance better than

optimizing the rare case. Ironically, the common case is often simpler than the rare case and hence is often easier to enhance.

IDEA 4: Performance via parallelismSince the dawn of computing, computer architects have offered designs that get

more performance by performing operations in parallel.

IDEA 5: Performance via pipeliningA particular pattern of parallelism is so prevalent in computer architecture that it

merits its own name: pipelining.

IDEA 6: Performance via predictionIt can be better to ask for forgiveness than to ask for permission, the next great

idea is prediction.

IDEA 7 Hierarchy of memoriesProgrammers want memory to be fast, large, and cheap, as memory speed often

shapes performance, capacity limits the size of problems that can be solved, and the cost of memory today is often the majority of computer cost.

Architects have found that they can address these conflicting demands with a hierarchy of memories, with the fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest, largest, and cheapest per bit at the bottom.

Caches give the programmer the illusion that main memory is nearly as fast as the top of the hierarchy and nearly as big and cheap as the bottom of the hierarchy.

The shape indicates speed, cost, and size: the closer to the top, the faster and more expensive per bit the memory; the wider the base of the layer, the bigger the memory.

IDEA 8: Dependability via redundancyComputers not only need to be fast; they need to be dependable.

Since any physical device can fail, we make systems dependable by including redundant components that can take over when a failure occurs and to help detect failures.

COMPONENTS OF A COMPUTER SYSTEMA computer system consists of both hardware and information stored on hardware.

Information stored on computer hardware is often called software.

The hardware components of a computer system are the electronic and mechanical parts.

The software components of a computer system are the data and the computer programs.

The major hardware components of a computer system are:

Processor Main memory Secondary memory Input devices Output devices

For typical desktop computers, the processor, main memory, secondary memory, power supply, and supporting hardware are housed in a metal case. Many of the components are connected to the main circuit board of the computer, called the motherboard.

The power supply supplies power for most of the components. Various input devices (such as the keyboard) and output devices (such as the monitor) are attached through connectors.

HARDWARE COMPONENTS:

The terms input and output say if data flow into or out of the computer. The arrows show the direction of data flow.

A Bus is a group of wires on the main circuit board of the computer. It is a pathway for data flowing between components. Most devices are connected to the bus through a Controller which coordinates the activities of the device with the bus.

The Processor is an electronic device about a one inch square, covered in plastic. Inside the square is an even smaller square of silicon containing millions of tiny electrical parts. A modern processor may contain billions of transistors. It does the fundamental computing within the system, and directly or indirectly controls all the other components.

The processor is sometimes called the Central Processing Unit or CPU. A particular computer will have a particular type of processor, such as a Pentium processor or a SPARC processor.

MEMORY:

The processor performs all the fundamental computation of the computer system. Other components contribute to the computation by doing such things as storing data or moving data into and out of the processor. But the processor is where the fundamental action takes place.

A processor chip has relatively little memory. It has only enough memory to hold a few instructions of a program and the data they process. Complete programs and data sets are held in memory external to the processor.

This memory is of two fundamental types: main memory, and secondary memory.

Main memory is sometimes called volatile because it loses its information when power is removed. Main memory is sometimes called main storage.

Secondary memory is usually nonvolatile because it retains its information when power is removed. Secondary memory is sometimes called secondary storage or mass storage.

Main memory Secondary memory

a. closely connected to the processor.

a. connected to main memory through the bus and a controller.

b. stored data are quickly and easily changed.

b. stored data are easily changed, but changes are slow compared to main memory.

c. holds the programs and data that the processor is actively working with.

c. used for long-term storage of programs and data.

d. interacts with the processor millions of times per second.

d. before data and programs can be used, they must be copied from secondary memory into main memory.

e. needs constant electric power to keep its information.

e. does not need electric power to keep its information.

Main memory is where programs and data are kept when the processor is actively using them. When programs and data become active, they are copied from secondary memory into main memory where the processor can interact with them. A copy remains in secondary memory.

Main memory is intimately connected to the processor, so moving instructions and data into and out of the processor is very fast.

Main memory is sometimes called RAM. RAM stands for Random Access Memory. "Random" means that the memory cells can be accessed in any order. However, properly speaking, "RAM" means the type of silicon chip used to implement main memory.

Secondary memory is where programs and data are kept on a long-term basis. Common secondary storage devices are the hard disk and optical disks.

The hard disk has enormous storage capacity compared to main memory. The hard disk is usually contained inside the case of a computer. The hard disk is used for long-term storage of programs and data. Data and programs on the hard disk are organized into files. A file is a collection of data on the disk that has a name.

A hard disk might have a storage capacity of 500 gigabytes (room for about 500 x 109

characters). This is about 100 times the capacity of main memory. A hard disk is slow compared to main memory. If the disk were the only type of memory the computer system would slow down to a crawl. The reason for having two types of storage is this difference in speed and capacity.

Large blocks of data are copied from disk into main memory. The operation is slow, but lots of data is copied. Then the processor can quickly read and write small sections of that data in main memory. When it is done, a large block of data is written to disk.

Often, while the processor is computing with one block of data in main memory, the next block of data from disk is read into another section of main memory and made ready for the processor. One of the jobs of an operating system is to manage main storage and disks this way.

INPUT AND OUTPUT DEVICES

Input and output devices allow the computer system to interact with the outside world by moving data into and out of the system.

An input device is used to bring data into the system. Some input devices are:

Keyboard Mouse Microphone Bar code reader Graphics tablet

An output device is used to send data out of the system. Some output devices are:

Monitor Printer Speaker

A network interface acts as both input and output. Data flows from the network into the computer, and out of the computer into the network.

I/O

Input/output devices are usually called I/O devices. They are directly connected to an electronic module attached to the motherboard called a device controller.

For example, the speakers of a multimedia computer system are directly connected to a device controller called an audio card, which in turn is plugged into a bus on the motherboard.

Sometimes secondary memory devices like the hard disk are called I/O devices.

EMBEDDED SYSTEMS

A computer system that is part of a larger machine and which controls how that machine operates is an embedded system. Usually the processor constantly runs a single control program which is permanently kept in ROM (Read Only Memory).

ROM is used to make a section of main memory read-only. Main memory looks the same as before to the processor, except a section of it permanently contains the program the processor is running. This section of memory retains its data even when power is off.

A typical embedded system is a cell phone. Digital cameras, DVD players, medical equipment, and even home appliances contain dedicated processors.

SOFTWARE

Computer software consists of both programs and data.

Programs consist of instructions for the processor.

Data can be any information that a program needs: character data, numerical data, image data, audio data, and countless other types.

TYPES OF PROGRAMS

There are two categories of programs.

Application programs (usually called just "applications") are programs that people use to get their work done. Computers exist because people want to run these programs.

Systems programs keep the hardware and software running together smoothly. The difference between "application program" and "system program" is fuzzy. Often it is more a matter of marketing than of logic.

The most important systems program is the operating system. The operating system is always present when a computer is running. It coordinates the operation of the other hardware and software components of the computer system.

The operating system is responsible for starting up application programs, running them, and managing the resources that they need. When an application program is running, the operating system manages the details of the hardware for it.

Modern operating systems for desktop computers come with a user interface that enables users to easily interact with application programs (and with the operating system itself) by using windows, buttons, menus, icons, the mouse, and the keyboard. Examples of operating systems are Unix, Linux, Android, Mac OS, and Windows.

OPERATING SYSTEMS

An operating system is a complex program that keeps the hardware and software components of a computer system coordinated and functioning.

Most computer systems can potentially run any of several operating systems. For example, most Pentium-based computers can run either Linux or Windows operating systems. Usually only one operating system is installed on a computer system, although some computers have several.

In any case, only one operating system at a time can be in control of the computer system. The computer user makes a choice when the computer is turned on, and that operating system remains in control until the computer is turned off.

TECHNOLOGY:

To plan for the evolution of a machine, the designer must be especially aware of rapidly occurring changes in implementation technology. Three implementation technologies, which change at a dramatic pace, are critical to modern implementations:■ Integrated circuit logic technology:Transistor density increases by about 50% per year, quadrupling in just over three years. Increases in die size are less predictable, ranging from 10% to 25% per year. The combined effect is a growth rate in transistor count on a chip of between 60% and 80% per year. Device speed increases nearly as fast; however, metal technology used for wiring does not improve, causing cycle times to improve at a slower rate. ■ Semiconductor DRAM:Density increases by just under 60% per year, quadrupling in three years. Cycle time has improved very slowly, decreasing by about one-third in 10 years.Bandwidth per chip increases as the latency decreases. In addition, changes to the DRAM interface have also improved the bandwidth. In the past, DRAM (dynamic random-access memory) technology has improved faster than logic technology. This difference has occurred because of reductions in the number of transistors per DRAM cell and the creation of specialized technology for DRAMs. ■ Magnetic disk technology: Recently, disk density has been improving by about 50% per year, almost quadrupling in three years. It appears that disk technology will continue the faster density growth rate for some time to come. Access time has improved by one-third in 10 years. These rapidly changing technologies impact the design of a microprocessor that may, with speed and technology enhancements, have a lifetime of five or more years. Even within the span of a single product cycle (two years of design and two years of production), key technologies, such as DRAM, change sufficiently that the designer must plan for these changes.. Traditionally, cost has decreased very closely to the rate at which density increases. PERFORMANCE:The computer user is interested in reducing response time—the time between the start and the completion of an event—also referred to as execution time. The manager of a large data processing center may be interested in increasing throughput—the total amount of work done in a given time. In comparing design alternatives, we often want to relate the performance of two different machines, say X and Y. The phrase “X is faster than Y” is used here to mean that the response time or execution time is lower on X than on Y for the given task. In particular, “X is n times faster than Y” will mean

Since execution time is the reciprocal of performance, the following relationship holds:

The phrase “the throughput of X is 1.3 times higher than Y” signifies here that the number of tasks completed per unit time on machine X is 1.3 times the number completed on Y. Because performance and execution time are reciprocals, increasing performance decreases execution time. The key measurement is time :

The computer that performs the same amount of work in the least time is the fastest . The difference is whether we measure one task (response time) or many tasks (throughput).

MEASURING PERFORMANCE:Even execution time can be defined in different ways depending on what we count. The most

straightforward definition of time is called wall-clock time, response time, or elapsed time, which is the latency to complete a task, including disk accesses, memory accesses, input/output activities, operating system overhead.

CPU time means the time the CPU is computing, not including the time waiting for I/O or running other programs. It can be further divided into the CPU time spent in the program, called user CPU time, and the CPU time spent in the operating system performing tasks requested by the program, called system CPU time.Choosing Programs to Evaluate Performance

A computer user who runs the same programs day in and day out would be the perfect candidate to evaluate a new computer.

To evaluate a new system the user would simply compare the execution time of her workload—the mixture of programs and operating system commands that users run on a machine.

There are four levels of programs used in such circumstances, listed below in decreasing order of accuracy of prediction.

1. Real programs:Real programs have input, output, and options that a user can select when running the program.2. Kernels:Livermore Loops and Linpack are the best known examples. Unlike real programs, no user would run kernel programs, for they exist solely to evaluate performance. Kernels are best used to isolate performance of individual features of a machine to explain the reasons for differences in performance of real programs.3. Toy benchmarks:Toy benchmarks are typically between 10 and 100 lines of code and produce a result the user already knows before running the toy program. Programs like Sieve of Eratosthenes, Puzzle, and Quicksort are popular because they are small, easy to type, and run on almost any computer. 4. Synthetic benchmarks:Synthetic benchmarks try to match the average frequency of operations and operands of a large set of programs. Whetstone and Dhrystone are the most popular synthetic benchmarks.

Benchmark Suites:Benchmark suites are made of collections of programs, some of which may be kernels, but

many of which are typically real programs.The programs in the popular SPEC92 benchmark suite used to characterize performance in

the workstation and server markets. The programs in SPEC92 vary from collections of kernels (nasa7)

to small, program fragments (tomcatv, ora, alvinn, swm256) to applications of varying size (spice2g6, gcc, and compress).

Reporting Performance Results:The guiding principle of reporting performance measurements should be reproducibility—list

everything another experimenter would need to duplicate the results. A SPEC benchmark report requires a fairly complete description of the machine, the compiler

flags, as well as the publication of both the baseline and optimized results. As an example, the SPECfp92 report for an IBM RS/6000 Powerstation 590. In addition to hardware, software, and baseline tuning parameter descriptions, a SPEC report

contains the actual performance times, shown both in tabular form and as a graph.The importance of performance on the SPEC benchmarks motivated vendors to add many

benchmark-specific flags when compiling SPEC programs; these flags often caused transformations that

would be illegal on many programs or would slow down performance on others.

Total Execution Time: A Consistent Summary Measure The simplest approach to summarizing relative performance is to use total execution time of

the two programs. ThusB is 9.1 times faster than A for programs P1 and P2.C is 25 times faster than A for programs P1 and P2.C is 2.75 times faster than B for programs P1 and P2.

This summary tracks execution time, our final measure of performance. If the workload consisted of running programs P1 and P2 an equal number of times, the statements above would predict the relative execution times for the workload on each machine.

An average of the execution times that tracks total execution time is the arithmetic mean.

where Timei is the execution time for the ith program of a total of n in the workload. If performance is expressed as a rate, then the average that tracks total execution time is the harmonic mean.

where Ratei is a function of 1/ Timei, the execution time for the ith of n programs in the workload.

Weighted Execution TimeThe first approach when given an unequal mix of programs in the workload is to assign a

weighting factor wi to each program to indicate the relative frequency of the program in that workload. If, for example, 20% of the tasks in the workload were program P1 and 80% of the tasks in the

workload were program P2, then the weighting factors would be 0.2 and 0.8. (Weighting factors add up to

1.) By summing the products of weighting factors and execution times, a clear picture of performance of the workload is obtained. This is called the weighted arithmetic mean:

where Weighti is the frequency of the ith program in the workload and Timei is the execution time of that program.

The weighted harmonic mean of rates will show the same relative performance as the weighted arithmetic means of execution times. The definition is

Quantitative Principles of Computer Design Make the Common Case Fast

Perhaps the most important and pervasive principle of computer design is to make the common case fast: In making a design trade-off, favor the frequent case over the infrequent case. This principle also applies when determining how to spend resources, since the impact on making some occurrence faster is higher if the occurrence is frequent. Improving the frequent event, rather than the rare event, will obviously help performance, too. A fundamental law, called Amdahl’s Law, can be used to quantify this principle.

Amdahl’s LawThe performance gain that can be obtained by improving some portion of a computer can be calculated using Amdahl’s Law. Amdahl’s Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. Amdahl’s Law defines the speedup that can be gained by using a particular feature. What is speedup? Suppose that we can make an enhancement to a machine that will improve performance when it is used. Speedup is the ratio

Speedup tells us how much faster a task will run using the machine with the enhancement as opposed to the original machine.Amdahl’s Law gives us a quick way to find the speedup from some enhancement, which depends on two factors:1. The fraction of the computation time in the original machine that can be converted to take advantage of the enhancement.

For example, if 20 seconds of the execution time of a program that takes 60 seconds in total can use an enhancement, the fraction is 20/60. This value, which we will call Fractionenhanced , is always less than or equal to 1.

2. The improvement gained by the enhanced execution mode; that is, how much faster the task would run if the enhanced mode were used for the entire program.

This value is the time of the original mode over the time of the enhanced mode: If the enhanced mode takes 2 seconds for some portion of the program that can completely use the mode, while the original mode took 5 seconds for the same portion, the improvement is 5/2. We will call this value, which is always greater than 1, Speedupenhanced

The execution time using the original machine with the enhanced mode will be the time spent using the unenhanced portion of the machine plus the time spent using the enhancement:

Amdahl’s Law expresses the law of diminishing returns: The incremental improvement in speedup gained by an additional improvement in the

performance of just a portion of the computation diminishes as improvements are added. An important corollary of Amdahl’s Law is that if an enhancement is only usable for a

fraction of a task, we can’t speed up the task by more than the reciprocal of 1 minus that fraction.Amdahl’s Law can serve as a guide to how much an enhancement will improve performance

and how to distribute resources to improve cost/performance. The goal, clearly, is to spend resources proportional to where time is spent. The CPU Performance Equation

Most computers are constructed using a clock running at a constant rate. These discrete time events are called ticks, clock ticks, clock periods, clocks, cycles, or clock cycles. Computer designers refer to the time of a clock period by its duration (e.g., 2 ns) or by its rate (e.g., 500 MHz). CPU time for a program can then be expressed two ways:

In addition to the number of clock cycles needed to execute a program, we can also count the number of instructions executed—the instruction path length or instruction count (IC). If we know the number of clock cycles and the instruction count we can calculate the average number of clock cycles per instruction (CPI):

By transposing instruction count in the above formula, clock cycles can be defined as IC × CPI. This allows us to use CPI in the execution time formula:

Expanding the first formula into the units of measure shows how the pieces fit together:

CPU performance is dependent upon three characteristics: clock cycle (or rate), clock cycles per instruction, and Instruction count.

Furthermore, CPU time is equally dependent on these three characteristics: A 10% improvement in any one of them leads to a 10% improvement in CPU time.

It is difficult to change one parameter in complete isolation from others because the basic technologies involved in changing each characteristic are also interdependent:■ Clock cycle time—Hardware technology and organization■ CPI—Organization and instruction set architecture■ Instruction count—Instruction set architecture and compiler technology

Sometimes it is useful in designing the CPU to calculate the number of total CPU clock cycles as

where ICi represents number of times instruction i is executed in a program and CPIi represents the average number of clock cycles for instruction i. This form can be used to express CPU time as

Locality of ReferenceWhile Amdahl’s Law is a theorem that applies to any system, other important fundamental

observations come from properties of programs. The most important program property that we regularly exploit is locality of reference:

Programs tend to reuse data and instructions they have used recently. A widely held rule of thumb is that a program spends 90% of its execution time in only 10% of

the code. Locality of reference also applies to data accesses, though not as strongly as to code accesses. Two different types of locality have been observed.

Temporal locality states that recently accessed items are likely to be accessed in the near future. Spatial locality says that items whose addresses are near one another tend to be referenced close

together in time.

POWER WALL:

Three major reasons for the unsustainable growth in uniprocessor performance

1. The Memory Wall:

Increasing gap between CPU and Main memory speed

2. The ILP Wall:

Decreasing amount of "work" (instruction level parallelism) for processor

3. The Power Wall:

Increasing power consumption of processor

• “Power wall”

– Power expensive

– Transistors “free”

• The power wall: it is not possible to consistently run at higher frequencies without hitting power/thermal limits.

UNIPROCESSOR TO MULTIPROCESSOR:

Introduction- Uniprocessor

A uniprocessor system is defined as a computer system that has a single central processing unit that is used to execute computer tasks.

The typical Uniprocessor system consists of three major components: the main memory, the Central processing unit (CPU) and the Input-output (I/O) sub-system.

The CPU contains an arithmetic and logic unit (ALU) with an optional floating-point accelerator, and some local cache memory with an optional diagnostic memory.

The CPU, the main memory and the I/O subsystems are all connected to a common bus, the synchronous backplane interconnect (SBI) through this bus, all I/O device scan communicate with each other, with the CPU, or with the memory.

A number of parallel processing mechanisms have been developed in uniprocessor computers and they are identified as multiplicity of functional units, parallelism and pipelining within the CPU, overlapped CPU and I/O operations, use of a hierarchical memory system, multiprogramming and time sharing, multiplicity of functional units.

.

Limitations of Uniprocessor Machine

• Clock speed cannot be increase further.

• In Digital circuit maximum signal speed is as much as light speed.

• More speed generates more heat.

Introduction-Multiprocessor

Multiprocessor system is an interconnection of two or more CPUs with memory and input-output equipment

The component that forms multiprocessor is CPUs IOPs connected to input –output devices, and memory unit that may be partitioned into a number of separate modules.

Multiprocessor are classified as multiple instruction stream, multiple data stream (MIMD) system.

Why Choose a Multiprocessor?

A single CPU can only go so fast, use more than one CPU to improve performance

Multiple users

Multiple applications

Multi-tasking within an application

Responsiveness and/or throughput

Share hardware between CPUs

Multiprocessor System Benefits

Reliability

Multiprocessing improves the reliability of the system so that a failure or error in one part has a limited effect on the rest of the system. If a fault causes one processor to fail, a second processor can be assigned to perform the functions of the disabled processor.

Improved System Performance

The system derives its high performance from the fact that computations can proceed in parallel in one of two ways.

1. Multiple independent jobs can be made to operate in parallel.

2. A single job can be partitioned into multiple parallel tasks.

How multiprocessor are classified?

Multiprocessor are classified by the way their memory is organized, mainly it is classified into two types

1. Tightly coupled multiprocessor

2. Loosely coupled multiprocessor

Tightly coupled Multiprocessor

A multiprocessor is a tightly coupled computer system having two or more processing units (Multiple Processors) each sharing main memory and peripherals, in order to simultaneously process programs

Tightly coupled Multiprocessor is also know as shared memory system

Loosely-coupled multiprocessor

Loosely-coupled multiprocessor systems (often referred to as clusters ) are based on multiple standalone single or dual processor commodity computers interconnected via a high speed communication system.

Loosely-coupled multiprocessor is also known as distributed memory.

Example

A Linux beowulf cluster

Difference b/w Tightly coupled and Loosely coupled multiprocessor

Tightly coupled

Tightly-coupled systems physically smaller than loosely-coupled system.

More expensive

Loosely coupled

It is just opposite of tightly coupled system.

Less expensive.

Chip Manufacturing Process

Date post:	19-Jul-2016
Category:	Documents
Upload:	kalaivani
View:	3 times
Download:	2 times

UNIT 1 ca

Documents