Advanced Computer Architecture.PDF

NOORUL ISLAM COLLEGE OF ENGINEERING, KUMARACOIL

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

2 & 16 Marks Question Answers

CS64- Advanced Computer Architecture

S6 BE(CSE)-AU

Prepared by

R.Suji Pramila

Lecturer/CSE

NIU

UNIT I

1. What is Instruction Level parallelism?

The technique which is used to overlap the execution of instructions and improveperformance is called ILP.

2. What are the approaches to exploit ILP?

The two separable approaches to exploit ILP are,

Dynamic or hardware intensive approach Static or Compiler intensive approach

3. What is pipelining?

Pipelining is an implementation technique whereby multiple instructions are overlappedin execution when they are independent of one another.

4. Write down the formula to calculate the pipeline CPI?

The value of the CPI (Cycles per Instruction) for the pipelined processor is the sum of theCPI and all contributions from stalls.

Pipeline CPI = Ideal pipeline CPI + structural stalls + Data hazard stalls + control stalls.

5. What is loop level parallelism?

Loop level parallelism is a way to increase the amount of parallelism available amonginstructions is to exploit parallelism among iterations of loop.

6. Give the methods to enhance performance of ILP?

To obtain substantial performance enhancements, the ILP across multiple basic blocksare exploited using

loop level parallelism vector instructions

7. List out the types of dependences.

There are three different types of dependences

Data dependences Name dependences Control dependences

8. What is Data hazard?

A hazard is created whenever there is dependence between instructions and they are closeenough that the overlap caused by pipelining, or other reordering of instructions, would changethe order of access to the operand involved in the dependence.

9. Give the classification of Data hazards

Data Hazards are classified into three types depending on the order of read and writeaccesses in the instructions

RAW (Read After Write) WAW (Write After Write) WAR (Write After Read)

10. List out the constraints imposed by control dependences?

The two constraints imposed by control dependencies are

An instruction that is control dependent on branch cannot be moved before the branchso that its execution is no longer controlled by the branch.

An instruction that is not control dependent on a branch cannot be moved after thebranch so that its execution is controlled by the branch.

11. What are the properties used for preserving control dependence?

Control dependence is preserved by two properties in a simple pipeline.

Instruction execute in program order Detection of control or branch hazards

12. Define Dynamic Scheduling?

Dynamic scheduling is a technique in which the hardware rearranges the instructionexecution to reduce the stalls while maintaining data flow and exception behavior.

13. List the advantages of dynamic scheduling?

It handles dependences that are unknown at compile time. It simplifies the compiler. Uses speculation techniques to improve the performance.

14. What is score boarding?

Score boarding is defined as it allows out of order execution when all the resources areavailable and there is no data dependence. It can’t be eliminated until these two hazards WAW,WAR are cleared.

15. What are the advantages of Tomosulo’s Approach?

Distribution of hazard detection layer Elimination of WAR and WAW hazard

16. What are the types of branch prediction?

There are two types of branch prediction. They are,

Dynamic branch prediction Static branch prediction

17. Define Amdahl’s Law?

This law states that particular performance of the computer can be improved byimproving some portion of the computer. This is known as Amdahl’s Law.

18. What are the things present in Dynamic branch prediction?

It uses two things they are,

Branch prediction buffer Branch history table

19. Define Correlating branch prediction?

Branch prediction that uses the behavior of other branches to make a prediction is calledcorrelating branch prediction.

20. What are the basic ideas of pipeline scheduling?

The basic ideas of pipeline scheduling are,

To keep pipeline full: Find sequence of unrelated instructions that can beoverlapped in the pipeline.

To avoid pipeline stall: Separate dependent instructions by a distance in clockdependent instructions by a distance in clock cycles equal to the pipeline latencyof that source instruction.

21. What are the four fields involved in ROB?

ROB contains four fields,

Instruction type Destination field Value field Ready field

22. What is reservation station?

In Tomasulo’s scheme register renaming is provided by reservation station. The basicidea is that the reservation station fetches and buffers an operand as soon as it is available,eliminating the need to get the operand from a register.

23. What is ROB?

ROB stands for reorder buffer. It supplies operands in the interval between completion ofinstruction execution and instruction commit. ROB is similar to the store buffer in Tomasulo’salgorithm.

24. What is imprecise exception?

An exception is imprecise if the processor state when an exception is raised does not lookexactly as if the instructions were executed sequentially in strict program order.

25. What are the two possibilities of imprecise exceptions?

If the pipeline has already completed instructions that are later in program orderthen that instruction will cause exception.

If the pipeline has not yet completed instructions that are earlier in program orderthen that instructions will cause exception.

26. What are the two main features preserved by maintaining both data and control dependence?

Exception behavior Data flow

27. What are the types of dependence?

Anti dependence Output dependence

28. What is anti dependence?

An anti dependence between instruction i and instruction j occurs when instruction jwrites a register or memory location that instruction i reads. The original ordering must bepreserved to ensure that i read the correct value.

29. What is output dependence?

An output dependence occurs when instruction i and instruction j write the same registeror memory location. The ordering between the instructions must be preserved to ensure that thevalue finally written corresponds to instruction j.

30. What is register renaming?

Renaming of register operand is called register renaming. It can be either done staticallyby the compiler or dynamically by the hardware.

UNIT 2

1. Define VLIW.

VLIW is a technique for ILP by executing instructions without dependencies in parallel.The compiler analysis the program and detects operations to be executed in parallel; suchoperations are packed into one “ large” instruction.

2. List out the advantages of VLIW processor.

Simple hardwareNumber of functional units can be increased without needing additionalsophisticated hardware to detect parallelism like in superscalus.

Good compilers can detect parallelism based on global analysis of the wholeprogram.

3. Define EPIC

Epic is Explicit Parallel Instruction Computing It is an architecture framework proposed by HP. It is based on VLIW and was designed to overcome the key limitations of VLIW

while simultaneously giving more flexibility to compiler writers.

4. What is loop level analysis?

Loop level analysis involves determining what depends exist among the operands in aloop across the iterations of a loop are data dependent on data values produced in earlieriterations.

5. What are the types of Data dependencies in loops?

Loop Carried dependencies Not loop carried dependence

6. What is loop carried dependence?Data dependence between different loop iterations (data produced in earlier iterationsused in a later one) is called a loop carried dependence.

7. What are the tasks in finding the dependence in a program?

There are 3 tasks. They are Have good scheduling of code Determine which loop might contain parallelism Eliminate name dependence

8. Define dependence analysis algorithm.

Dependence analysis algorithm is algorithm used to detect the dependence by thecompiler based on the assumptions that

Array indices are affine There exist GCD of the two affine indices

9. What is copy propagation?

Copy propagation is the algebraic simplifications of expressions and an optimizationwhich eliminates operation that copy values.

10. What is tree-height reduction technique?

Tree-height reduction is optimization which reduces the height of the tree structurerepresenting a computation, making it wider but shorter.

11. What are the components of software pipeline loop? A software pipeline loop consists of a loop body, start- up code and clean-up

code. Start up code is to execute code left out from the first original loop iterations. Finish code to execute instructions from the last original iterations.

12. What is trace scheduling?

Trace scheduling is way to organize the process of global code motion it simplifiesinstruction scheduling by incuring the cost of possible code motion on the less criticalpaths.

13. List out steps used for trace scheduling.

Trace selection Trace compaction

14. Define Inter-procedural analysis.

A procedure with pointer parameters and if we want to analyse the procedure across theboundaries of the particular procedure. It is called interprocedural analysis.

15. What is software pipelining?

It is a technique for reorganizing loop such that each iteration in the code is made frominstructions chosen from different iterations of original loop.

16. Define critical path.

Critical path is defined as the longest sequence of dependent instructions in a program.

17. Define IA-64 processor.

The IA-64 is a RISC-Style, register-register instruction set with the features designed tosupport compiler based exploitation of ILP.

18. What is CFM and what is its use?

CFM stands for Current Frame Pointer CFM pointer points to the set of registers to be used by a given procedure.

19. What are the parts of CFM pointer?

There are two parts. They are Local area – Used for local storage Output area - Used to pass values to any called procedure.

20. What is Itanium processor?

Itanium processor is a implementation of Intel IA-64 processor. It is capable of having 6issues per clock cycle. The 6 issues includes 3 branches and 2 memory reference.

21. What are the parts of 10 stage pipeline in Itanium processor?

Front end Instruction delivery(EXP, REN) Operand delivery(WLD, REG) Execution(EXE, DEG, WRB)

22. What are the limitations of ILP?

Limitations on hardware model Limitations on window size and maximum issue count Effect of finite register Effects of imperfect alias analysis

23. List the two techniques for eliminating dependent computations

Software pipelining Trace scheduling

24. Define Trace selection and Trace compaction

Trace SelectionTrace selection tries to find a likely sequence of basic blocks whose operations will be

put into small number of instructions this sequence is called trace.

Trace CompactionTrace compaction tries to squeeze the trace into a small numberof wide instructions. Trace compaction is code scheduling hence it attempts to moveoperations as early as it can in a sequence packing the operations into as few wideinstructions as possible.

25. Define Superblocks.

Superblocks are formed by a process similar to that used for traces, but are a form ofextended basic blocks, which are restricted to a single entry point but allow multiple exits.

26. Use of conditional or predicted instructions.

Conditional or predicted instructions are used to eliminate braches, converting a controldependencies and potentially improving performance.

27. Define Instruction Group

Instruction group is a sequence of consecutive instructions with no register data dependenciesamong them. All the instructions in a group could be executed in parallel if sufficienthardware resources existed and if any dependences through memory were preserved.

28. Use of template field in bundle.

The 5 bit template field within each bundle describes both the presence of any stopsassociated with the bundle and the execution unit type required by each instruction withinthe bundle.

29. List the two types of speculation supported by IA 64 processor.

Control Speculation Memory reference speculation

30. Define Advance loads.

Memory reference support in the IA 64 uses a concept called advanced loads. Advance loadis a load that has been speculatively moved above store instructions on which it is potentiallydependent. To speculatively perform a load the ld.a instruction is used.

31. Define ALATExecuting advance load instructions created an entry in a special table called ALAT. Itstores both the register destination of the load and the address of the accessed memorylocation. When a store is executed, an associative look up against the active ALATentries is performed. If there is an ALAT entry with the same memory memory addressas the store, mark the ALAT entry as invalid.

32. What are the functional units in Itanium Processor?There are nine functional units in the Itanium processor.

Two I unitsTwo M unitsThree B unitsTwo F units

All the functional units are pipelined.

33. Define Scoreboard

In Itanium processor 10 stage pipeline divided into 4 parts. In operand delivery partscoreboard is used to detect when individual instruction can proceed so that a start of oneinstruction in a bundle need not cause the entire bundle to stall.

34. Define Book Keeping Code

Basic block consists of 1 entry and 1 exit code. This code is known as Book 1KeepingCode.

Unit-31. Define cache coherence problem?

Cache coherence problem describes how two different processors can have two differentvalues for the memory location.

2. What are the two aspects of cache coherence problem?i. coherence- It determines what value can be returned by the particular read

operation.ii. Consistency- It determine when the value may be returned by the read

operation.

3. What are the two types of cache coherence protocol?i. Directory based protocol.ii. Snooping protocol.

4. Define Directory based protocol.The shared portion of the main memory may be kept in one common place calleddirectory. From this directory we can retrieve the data.

5. Name the different types of snooping protocol.i. invalidate protocolii. update/write broadcast protocol.

6. Difference between write Update and invalidate protocol.Write update:

i. Multiple write broadcast is presentii. Here they consider separate word for each cache blockiii. Access time is less

Invalidate:i. Only one invalidation is presentii. Invalidation is performed for entire cache blockiii. Access time is high

7. What are the different types of access in distributed shared memory architecture?i, Local:

If the processor refers the local memory then it is called local access.ii. Remote:

If the processor refers the other process memory then it is called remote access

8. What are the disadvantages of remote access? Compiler mechanism for cache coherence is very limited

Without the cache coherence property the multiprocessor system loss theadvantage of fetch and use multiple words

Prefetch is very useful only when the multiprocessor fetch multiple word

9. What are the states available in directory based protocol?i. Shared:-One or more processor can have the copies of same dat.ii. Uncached :- No processor has the copy of data block.iii. Exclusive:- Exactly one processor has the copy of data block.

10. What are the nodes available in distributed system?i. Local Nodeii. Home Nodeiii. Remote Node

11. Define Synchronization.Synchronization is the mechanism that is build with user level software routine,

which depends on hardware supplied synchronization instruction.

12. Name the basic hardware primitives.i. Atomic Exchangeii. Test and set

iii. Fetch and Increment13. Define spinlock.

It is a lock that a processor continuously tries to acquire spinning around a loop until itsucceeds

It is mainly used when the programmer wants to use the lock for a small period of time

14. What are the mechanism to implement locks?There are two methods to implement the locks.i. Implementing lock without using cache coherenceii. Implementing lock using cache coherence.

15. What are the advantage of using spin lock?There are two advantages of using spin lock

i. They have low overheadii. Performance is high

16. Name the synchronization mechanisms for large scale multiprocessor.i. Exponential back offii. queuing locksiii. combining tree

17. What are the two primitives used for implementing synchronization? Lock Based Implementation

Barrier based Implementation

18. Define sequential consistency.It requires that the result of any execution be the same as, if the memory access executed

by each processor where kept in order and accesses among different processor areinterleaved.It reduces the amount of incorrect execution

19. Define multithreading.The process of executing the multiple thread by common memory or commonprocessor in which the execution is done is overlapping fashion.

20. What are the types of multi threading?i. Fine grained multithreading:- It has the ability to switch threads for each

instructionii. coarse grained multithreading:- It has the ability to switch the threads only for

costly stalls.

Unit-4

1. Define cache.Cache is the name given to the first level of the memory hierarchy encountered once

the address leaves the CPU.Eg: file caches, name caches.

2. What are the factors on which the cache miss depends on?The time required for the cache miss depends on both

Latency Bandwidth

3. What is the principle of locality?Program access a relatively small portion of the address space at any instant oftime is called principle of locality.

4. What is called pages?The address space is usually broken into fixed-size blocks, called pages. Eachpage resides either in main memory or on disk.

5. What is called memory stall cycles?The number of cycles during which the CPU is stalled waiting for a memoryaccess is called memory stall cycles.

6. Write down the formula for calculating average memory access time?Average memory access time=Hit time+Miss rate*Miss penalty.When hit time is the time to hit in the cache, the formula can help us decidebetween split caches and a unified cache.

7. What are the techniques to reduce the miss rate? Larger block size

Larger caches Higher associativity Way prediction and pseudo associative caches

Compiler optimizations.8. What are the techniques to reduce hit time?

Small and simple cache: direct mapped Avoid address translation during indexing of the cache Pipelined cache access

Trace cache9. List out the types of storage devices.

Magnetic storages : disk, floppy, tape

Optical storages : compact disks(CD), digital/video/ verstailedisks(DVD)

Electrical storage : flash memory

10. What is sequence recorded?The sequence recorded on the magnetic medics is a sector number, a gap, the

information for that sector including error correction code, a gap, the sector number ofthe next sector and so on.

11. What is termed as cylinder?The term cylinder is used to refer to all the tracks under the arms at a given point

on all surfaces.12. List the components to a disk access.

There are three mechanical components to a disk access: Rotation latency

Transfer time Seek time

13. What is average seek time?Average seek time is the sum of the time for all possible seeks divided by the

number of possible seek. Average seek times are advertised to be 5 ms to 12 ms.14. What is transfer time

Transfer time is the time it takes to transfer a block of bits, typically a sector,under the read-write head. This time is a function of the block size, disk size, rotationspeed, recording density of the track, and speed of the electronics connecting the disk tocomputer.

15. Write the formula to calculate the CPU execution time.CPU execution time=(CPU clock cycles+ memory stall cycles)*clock cycle time.

16. Write the formula to calculate the CPU time.CPU time=(CPU execution clock cycles+ memory stall clock cycles)* clock cycletime.

17. Define miss penalty for an out of order execution processor.For an out of order execution processor, miss penalty is defined as follows.(Memory stall cycles/Instruction) *( misses/instruction) *(total miss latency-

overlapped miss latency.18. What are the techniques available to reduce cache penalty or miss rate via parallelism?

The three techniques that overlap the execution of instructions are1.Non blocking caches to reduce stalls on cache miss- to match the out of

order processors2.Hardware prefetching of instructions and data3.Compiler- controlled prefetching.

19. How are the conflict misses divided?The four divisions of conflict misses are,

Eight way Four way

Two way One way

20. List the advantage of memory hierarchy?

Memory hierarchy takes advantageof

a.locality

b.cost/performance of memory technologies

22. What is the goal of memory hierarchy?

The goal is to provide a memory system with

*cost almost as low as the cheapest level of memory

*speed almost as fast as the faster level

23. Define cache hit ?

When the cpu finds a requests data item in the cache, it is called a cache hit.

*Hit Rate: the fraction of cache access found in the cache

*Hit Time: time to access the upperlevel which consists of RAM accesstime+Time to determine hit\miss

24.Define cache miss?

When the cpu doesnot find a data item it needs in the cache, a cache miss occurs

*Miss Rate-1-(Hit Rate)

*Miss penalty-Time to replace a block in cache +time to deliver the block to theprocessor

25. What does Latency and Bandwidth determine?

-Latency determine the time to retrieve the first word of the block

-Bandwidth determine the time to retrieve the rest of this block

26. What are the types of locality?

*Temporal locality(Locality in time)

*Spatial locality(Locality in space)

27. How does page fault occur?

When the cpu references an item within a page that is not present in the cache or mainmemory, a page fault occurs, and the entire page is moved from the disk to main memory

28. What is called the miss penalty?

The number of memory stall cycles depends on both the number of misses and thecost per miss, which is called the miss penalty

29. What is Average memory access time?

The average memory access time for processors is the better measure of memoryhierarchy performance with in-order execution

30. What are the categories of cache miss(3cs of cache miss)

*compulsory

*capacity

*conflict

31. What are the techniques to reduce miss penalty?

*multi-level caches

*critical word first and early restart

*giving priority to read misses over writes

*Merging writes buffer

*victim caches

UNIT-5

1) What is the function of Power Processing Unit?

*a full set of 64-bit power pc register.

*32-168 bit vector multimedia register.

*a 32 KB LI data cache.

*a 32 KB LI instruction cache.

2) List out the disadvantages of Heterogeneous multi-core processors?

*Developer productivity.

*Portability.

*Manage ability.

3) Define Software Multithreading

Software multithreading is a piece of software that is aware of more than onecore/processor and can use these to be able to simultaneously complete multiple tasks.

4) Define Hardware Multithreading

Hardware multithreading is a multithreading that allows multiple to share the functionalunits of a single processor in an overlapping fashion.

5) Difference between Software and Hardware Multithreading

*Multithreading(Computer Architecture), multithreading in hardware.

*Thread(Computer Science), multithreading in software.

6) List some advantages of Software Multithreading.

*Increased responsiveness and worker productivity.

-Increased application responsiveness when different tasks run in parallel.

*Improved performance in parallel environments.

-When running computations on multiple processors.

*More computations per cubic foot of data center.

-Web based applications are often multi-threaded in nature.

7) List out the two approaches of Hardware Multithreading.

The two main approaches in Hardware multithreading are

*Fine-grain Multithreading.

*Coarse-grain Multithreading.

8) Define Simultaneous Multithreading(SMT)

SMT is a variation on multithreading that uses resources of a multiple –issue,dynamically scheduled processor to exploit ILP at the samw time it exploits ILP. ie., convertthread-level parallelism into more ILP.

9) Give the features exploited by SMT.

It exploits the following features of modern processors

*Multiple Functional Units.

-Modern Processors typically have more functional units available than asingle thread can utilize.

*Register Renaming and Dynamic Scheduling.

-Multiple instructions from independent threads can co-exist and co-execute.

10) What are the Design challenges of SMT?

The Design Challenges of SMT processor includes the following-

*Larger Files needed to hold multiple contents.

*Not affecting clock cycle time.

*Instruction issue-more candidate instructions need to be considered.

*Instruction comlpletion-choosing which instructions to commit may be challenging.

*Ensuring that cache and TLB conflicts generated by SMT do not degrade performance.

11) Compare the SMT processor with the base Superscalar Processor

The SMT processor are compared to the base superscalar processor in several keymeasures

*Utilization of functional units.

*Utilization of Fetch units.

*Accuracy of branch predictors.

*Hit rates of primary caches.

*Hit rates of secondary caches.

12) List the factors that limits the issue slot usage

The issue slot usage is limited by the following factors.

*Imbalances in resources needs.

*Resources availabilty over multiple threads.

*Number of active threads considered.

*Finite Limitations of buffer.

*Ability to fetch enough instruction from multiple threads.

13) Define Multi-core microprocessor

A multi-core microprocessor is one that combines two or more separate processors in onepackage.

14) What is Het erogeneous Multi-core processors?

Herogeneous Multi-core processor is a processor in which multiple cores of differenttypes are implemented in one CPU.

15) List out the advantages of Herogeneous Multi-core processors.

*Massive parallelism.

*Specialization of Hardware for tools.

16) List out the Disadvantages of Herogeneous Multi-core processors.

*Developer productivity.

*Portability.

*Manageability.

17) What is IBM cell processor?

The IBM cell processor is a heterogeneous multi-core processor comprised of control-intensive processor and computative-intensive SIMD processor cores, each with its owndistinguishing feature.

18) List the components of IBM cell architecture

*Power Processing Elements(PPE).

*Synergistic Processor Elements(SPE).

*I/O controller.

*Element Interconnect Bus(EIB).

19) What are the components of PPE?

The PPE is made out of two main units..

1.Power Processor Unit(PPU)

2.Power Processor Storage Subsystem(PPSS)

20) What is Memory Flow Controller(MFC)?

The Memory Flow Controller is actually the interface between the SynergisticProcessor(SPU) and the rest of the cell chip. Actually, the MFC interfaces the SPU with the EIB.

16 MARKS

1. Explain the concepts and challenges of Instruction-Level Parallelism.

Define Instruction-Level Parallelism

Data dependences and hazardso Data dependenceso Name dependenceso Data hazards

Control Dependences

2. Explain dynamic scheduling using Tomasulo’s approach.

Explain the 3 steps:o Issueo Executeo Write result

Explain the 7 fields of reservation station

Figure: The basic structure of a MIPS floating-point unit using Tomasulo’salgorithm.

3. Explain the techniques for reducing branch costs with dynamic hardware prediction.

Define basic branch prediction and branch-prediction buffers.

Figure: The states in a 2-bit prediction scheme Correlating branch predictors

Tournament predictors: Adaptively combining local and global predictors Figure: state transition diagram for tournament predictors with 4 states.

4. Explain in detail about hardware-based speculation.

Define hardware speculation, instruction commit, reorder buffer. Four steps involved in instruction execution.

o Issueo Executeo Write resulto Commit

Figure: The basic structure of a MIPS FP unit using Tomasulo’s algorithm andextended to handle speculation

Multiple issue with speculation

5. Explain in detail about basic compiler techniques for exposing ILP.

Basic pipeline scheduling and loop unrolling.

Example codes Using loop unrolling and pipeline scheduling with static multiple issue

6. Explain in detail about static multiple issue using VLIW approach.

Define VLIW.

The basic VLIW approach:o Explain about the registers used.o Functional units used.o Complex global scheduling.

Example code Technical and logistical problems.

7. Explain in detail about advanced compiler support for exposing and exploiting ILP.

Detecting and Enhancing loop-level parallelism.o Finding dependenceso Eliminating dependent computations.

Software pipelining: Symbolic loop unrollingo Example code fragment.

Global code schedulingo Trace scheduling: focusing on the critical path.o Super blockso Example code fragment.

8. Explain in detail about hardware support for exposing more parallelism at compile time.

Conditional or Predicated instructions

o Example codes Compiler speculation with hardware support.

o Hardware support for preserving exception behavioro Hardware support for memory reference speculationo Example codes.

9. Explain in detail about the Intel IA-64 instruction set architecture.

The IA-64 register model

Instruction format and support for explicit parallelism. Instruction set basics

Predication and Speculation support The Itanium processor

o Functional units and instruction issueo Itanium performance

10. Explain the limitations of ILP.

Hardware model Limitations of the window size and maximum issue count.

Effects of realistic branch and jump prediction Effects of finite register. Effects of imperfect alias analysis.

11. Explain in detail about symmetric shared memory architecture.

Define multiprocessor cache coherence. Basic schemes for enforcing coherence.

o Define directory basedo Define snooping

Snooping protocols. Basic implementation techniques.

An example protocol.

12. Explain the performance of symmetric shared-memory multiprocessors.

Define true sharing and false sharing.

Performance measurements of the commercial workload. Performance of the multiprogramming and OS workload. Performance for the scientific/technical workload.

13. Explain in detail about synchronization.

Basic hardware primitives.o Define atomic exchange.o Define test and set, fetch-and-increment, load linked and store conditional

instructions. Implementing locks using coherence.

Synchronization performance challenges.o Barrier synchronizationo Code for simple and sense reversing barrier.

Synchronization mechanisms for larger-scale multiprocessors.o Software implementationso Hardware primitives

14. Explain the models of memory consistency.

Sequential consistency.

Relaxed consistency models.o W->R orderingo W->W orderingo R->W and R->R ordering

15. Explain the performance of symmetric shared-memory and distributed shared-memorymultiprocessors.

Symmetric shared-memory multiprocessors:

Define true sharing and false sharing. Performance measurements of the commercial workload. Performance of the multiprogramming and OS workload.

Performance for the scientific/technical workload.

Distributed shared-memory multiprocessor:

Miss rate Memory access cost unit

16. Explain in detail about reducing cache miss penalty.

First miss penalty reduction technique: multilevel caches. Second miss penalty reduction technique: critical word first and early restart.

Third miss penalty reduction technique: giving priority to read misses over writes. Fourth miss penalty reduction technique: merging write buffer Fifth miss penalty reduction technique: victim caches

17. Explain in detail about reducing miss rate.

First miss rate reduction technique: Larger block size. Second miss rate reduction technique: Larger caches. Third miss rate reduction technique: Higher associativity.

Fourth miss rate reduction technique: Way prediction and Pseudoassociativecaches

Fifth miss rate reduction technique: Compiler optimizations.o Loop interchangeo Blocking

18. Explain in detail about memory technology.

DRAM technology.

SRAM technology. Embedded processor memory technology: ROM and Flash

Improving memory performance in a standard DRAM chip Improving memory performance via a new DRAM interface: RAMBUS Comparing RAMBUS and DDR SDRAMDes.

19.Explain the types of storage devices.

Magnetic Disks The future of magnetic disks.

Optical disks Magnetic tapes Automated tape libraries

Flash memory

20. Explain in detail about Buses-Connecting I/O devices to CPU/Memory

Bus design decisions

Bus standards Interfacing storage devices to the CPU- Figure: A typical interface of I/O devices

and an I/O bus to the CPU-memory bus.

Delegating I/O responsibility from the CPU

21. Explain in detail about SMT.

Converting thread-level parallelism to instruction-level parallelism Design challenges in SMT processors

Potential performance advantages from SMT

22. Explain about CMP architecture. Define CMP

Architecture Explanation

23. Explain detail about software and hardware multithreadingo Software multithreadingo Hardware multithreadingo Explanation

24. Explain about heterogeneous multi core processor.o Define multi core processor.o heterogeneous multi core processoro Diagram

25. Explain about IBM cell processor

Define cell processor Architecture Explanation

Date post:	28-Nov-2015
Category:	Documents
Upload:	suganya-periasamy
View:	42 times
Download:	0 times

Advanced Computer Architecture.PDF

Documents