Computer organization memory hierarchy

transcript

Memory hierarchy

3.1 Memory hierarchy

3.2 Good Memory ?

• Fast• Large• Inexpensive• == not possible

3.2 MEMORY CONCEPTS AND CHARACTERISTICS

• Cache memory

• Virtual memory

3.3 semiconductor Ram memories

Semi-Conductor memories are available is a wide range of speeds.

Their cycle time ranges from 100ns to 10ns

Word line

Internal organization

Static RAM (S RAM)

D RAMS

Asynchronous DRAMS:-

• 16 M bit• 4K x 4K• 4K x (512 x 8)• 4096 rows 4K columns• Each row divided into 512 group of 8 bit• RAS - Row Address Strobe• CAS – Column Address Strobe• 21 bit address,12 bit to select row,9 bit to

specify group of 8 bit

Synchronous DRAM

Timing diagram

Performance

Memory controller

Flash Memory• EEPROM → flash devices

CACHE MEMORIES

• Modern processors are faster than memory• So Processors may waste time for accessing

memory• Its purpose is to make the main memory

appear to the processor to be much faster than it actually is

CACHE MEMORIES continued…..

• The effectiveness of cache mechanism is based on the property of „Locality of reference’.

• Many instructions in the localized areas of the program are executed repeatedly during some time period and remainder of the program is accessed relatively infrequently

• Temporal(The recently executed instruction are likely to be executed again very soon.)

• Spatial(The instructions in close proximity to recently executed instruction are also likely to be executed soon.)

The term Block refers to the set of continuous address locations of some size.

The cache line is used to refer to the cache block.

• If the active segment of the program is placed in cache memory, then the total execution time can be reduced significantly.

• The Cache memory stores a reasonable number of blocks at a given time but this number is small compared to the total number of blocks available in Main Memory.

• The correspondence between main memory block and the block in cache memory is specified by a mapping function.

• The Cache control hardware decide that which block should be removed to create space for the new block that contains the referenced word.

• The collection of rule for making this decision is called the replacement algorithm.

• If it exists, then Read/Write operation will take place on appropriate cache location. In this case Read/Write hit will occur.

• In a Read operation, the memory will not involve.

• The write operation is proceed in 2 ways.They are,

• Write-through protocol • Write-back protocol

• Write-through protocol: • Here the cache location and the main memory

locations are updated simultaneously.

• Write-back protocol: • This technique is to update only the cache

location and to mark it as with associated flag bit called dirty/modified bit.

• The word in the main memory will be updated later, when the block containing this marked word is to be removed from the cache to make room for a new block.

• If the requested word currently not exists in the cache during read operation, then read miss will occur.

• To overcome the read miss Load –through / Early restart protocol is used.

• Read Miss: • The block of words that contains the

requested word is copied from the main memory into cache.

• Load –through: • After the entire block is loaded into cache,the

particular word requested is forwarded to the processor.

• If the requested word not exists in the cache during write operation,then Write Miss will occur.

• If Write through protocol is used,the information is written directly into main memory.

• If Write back protocol is used then block containing the addressed word is first brought intothe cache and then the desired word in the cache is over-written with the new information.

Mapping Function:

• Direct Mapping: • Associative Mapping: • Set-Associative Mapping:

Direct Mapping

• block j of the main memory maps onto block „j‟ modulo 128 of the cache.

• Assume 1 block contain 16 words• When 0,128,256 is loaded in the cache,it is

stored in block 0. • Block 1,129,257 are stored in cache block 1

and so on.

Problem (contention) arise• When the cache is full • When more than one memory block is mapped

onto a given cache block position.

• The contention is resolved by allowing the new blocks to overwrite the currently resident block

Placement of block in the cache

• Determined from memory address• Assume 16 bit address line • Low Order 4 bit field(word) Selects one of 16 words in

a block. • 7 bit cache block field When new block enters cache,7

bit determines the cache • position in which this block must be stored. • 5 bit Tag field The high order 5 bits of the memory

address of the block is • stored in 5 tag bits associated with its location in the

cache.

• As execution proceeds, the high order 5 bits of the address is compared with tag bits associated with that cache location.

• If they match, then the desired word is in that block of the cache. • If there is no match, then the block containing the required word

must be first read from the main memory and loaded into the cache.

Merit: • It is easy to implement.

Demerit: • It is not very flexible.

Associative Mapped Cache

• 12 tag bits (5+7)• the main memory block can be placed into

any cache block position • A new block that has to be brought into the

cache has to replace(eject)an existing block if the cache is full

Merit: • It is more flexible than direct mapping

technique. Demerit: • Its cost is high.

Set-Associative Mapping

• combination of direct and associative mapping• The blocks of the cache are grouped into sets • In this example, the cache has two blocks per

set• 0,64,128……..4032 maps into cache set „0‟ and

they can occupy either of the two block position within the set

• 6 bit set field Determines which set of cache contains the desired block .

• 6 bit tag field The tag field of the address is compared to the tags of the two blocks of the set to clock if the desired block is present

Merit: • The Contention problem of direct mapping is

solved by having few choices for block placement.

• The hardware cost is decreased by reducing the size of associative search

Replacement Algorithm

• In direct mapping, the position of each block is pre-determined and there is no need of replacement strategy.

• In associative & set associative method, the block position is not pre-determined; ie..when the cache is full and if new blocks are brought into the cache, then the cache controller must decide which of the old blocks has to be replaced.

Replacement Algorithm

• when a block is to be over-written,it is sensible to over-write the one that has gone the longest time without being referenced. This block is called Least recently Used(LRU) block & the technique is called LRU algorithm.

• The cache controller track the references to all blocks with the help of block counter.

Replacement Algorithm • Consider 4 blocks/set in set associative cache, • 2 bit counter can be used for each block. • When a ‘hit’ occurs,then block counter=0;The

counter with values originally lower than the referenced one are incremented by 1 & all others remain unchanged.

• When a ‘miss’ occurs & if the set is full,the blocks with the counter value 3 is removed,the new block is put in its place & its counter is set to „0‟ and other block counters are incremented by 1.

Merit: • The performance of LRU algorithm is

improved by randomness in deciding which block is to be over-written.

PERFORMANCE CONSIDERATION

• Interleaving • Hit rate and miss penalty• Cashes on processor

Interleaving

VIRTUAL MEMORY

• Techniques that automatically move program and data blocks into the physical main memory when they are required for execution is called the Virtual Memory.

• The binary address that the processor issues either for instruction or data are called the virtual / Logical address.

• The virtual address is translated into physical address by a combination of hardware and software components.This kind of address translation is done by MMU(Memory Management Unit).

• When the desired data are in the main memory ,these data are fetched /accessed immediately.

• If the data are not in the main memory,the MMU causes the Operating system to bring the data into memory from the disk.

Address Translation

• In address translation,all programs and data are composed of fixed length units called Pages.

• The Page consists of a block of words that occupy contiguous locations in the main memory.

• The pages are commonly range from 2K to 16K bytes in length

• Each virtual address generated by the processor contains virtual Page number(Low order bit) and offset(High order bit)

• Virtual Page number+ OffsetSpecifies the location of a particular byte (or word) within a page.

MEMORY MANAGEMENT UNIT(MMU)

• Management routines are part of the Operating system.

• Assembling the OS routine into virtual address space is called „System Space‟.

• The virtual space in which the user application program reside is called the „User Space’.

• Each user space has a separate page table. • The MMU uses the page table to determine the

address of the table to be used in the translation process.

Secondary Storage

• Magnetic Disk • Optical Disk • Magnetic Tapes

or Manchester Encoding

Magnetic Disk

Disk• one or more disk platters (plates) mounted on a

common spindle• Thin magnetic film deposited on each platter (both

sides)• placed in a drive that causes it to rotate at a

constant speed• surfaces move in close proximity to read/write

heads• Data are stored on concentric tracks• read/write heads move radialy to access different

tracks

Read/Write

• head consists of magnetic yoke & magnetizing coil

• Digital information is stored by applying the current pulse of suitable polarity to the magnetizing coil

• Only changes in the magnetic field under the head can be sensed during the Read operation.

• if the binary states 0 & 1 are represented by two opposite states of magnetization, a voltage is induced in the head only at 0-1 and at 1-0 transition in the bit stream.

• A consecutive (long string) of 0‟s & 1‟s are determined by using the clock which is mainly used for synchronization.

• Phase Encoding or Manchester Encoding is the technique to combine the clocking information with data.

• The Manchester Encoding is a self-clocking scheme

• changes in magnetization occur for each data bit

Wanchester Technology:

• Read/Write heads are placed in a sealed, air –filtered enclosure called the Wanchester Technology.

Organization and Accessing of Data

• Each surface is divided into concentric tracks. • Each track is divided into sectors.

• The data are accessed by specifying the surface number,track number and the sector number.

• The Read/Write operation start at sector boundaries. • Data bits are stored serially on each track. • Each sector usually contains 512 bytes. • .

Organization and Accessing of Data

• Sector header -> contains identification information.

• It helps to find the desired sector on the selected track.

• ECC (Error checking code)- used to detect and correct errors

Access time

• There are 2 components involved in the time delay between receiving an address and the beginning of the actual data transfer. They are,

• Seek time • Rotational delay / Latency

Access time

• Seek time – Time required to move the read/write head to the proper track.

• Latency – The amount of time that elapses after the head is positioned over the correct track until the starting position of the addressed sector passes under the read/write head.

• Seek time + Latency = Disk access time

Disk Controller

• The disk controller acts as interface between disk drive and system bus.

• The disk controller uses DMA scheme to transfer data between disk and main memory.

Typical disk

• One inch disk- weight=1 ounce, size -> comparable to match book • Capacity -> 1GB • 3.5 inch disk has the following parameter • Recording surface=20 • Tracks=15000 tracks/surface • Sectors=400. • Each sector stores 512 bytes of data • Capacity of formatted

disk=20x15000x400x512=60x109 =60GB • Seek time=3ms • Platter rotation=10000 rev/min • Latency=3ms • Internet transfer rate=34MB/s

Optical Disks• compact disk(CD)• indentations arranged to form a long spiral track on its

surface• A laser beam is directed onto a spinning disk• laser emits a coherent light beam that is sharply

focused on the surface• makes use of the fact that laser Light can be focused

on a very small spot• The indentations reflect the focused beam toward a

photo detector, which detects the stored binary patterns.

CD-ROM

• Bottom layer- Transparent polycarbonate plastic• Surface is programmed to store data by indenting

it with pits• Un indented parts are called lands• Thin layer of reflecting aluminum material is

placed on top of a programmed disk• aluminum is then covered by a protective acrylic• topmost layer is deposited and stamped with a

label• information is stored at the time of manufacture• 1980s

• pattern is not a direct representation of the stored data

• CDs use a complex encoding scheme to represent data

• Data is stored in tracks and sectors as in a Magnetic disc

CD-Recordable• 1990s• A shiny spiral track covered by an organic dye is

implemented on a disk• Laser in a CD-R drive burns pits into the organic

dye• burned spots become opaque• They reflect less light than the shiny areas when

the CD is being read• process is irreversible

CD-Rewritable

• Structure similar to the structure of CD-Rs.• recording layer - an alloy of silver, indium,

antimony, and tel• If heated above its melting point (500 degrees

C) and then cooled down, it goes into an amorphous state in which it absorbs light

• if it is heated only to about 200 degrees C and this temperature is maintained for an extended period, a process known as annealing takes place, which leaves the alloy in a crystalline state that allows light to pass through

• crystalline state represents land area, pits can be created by heating selected spots past the melting point

• The stored data can be erased using the annealing process, which returns the alloy to a uniform crystalline state

• A reflective material is placed above the recording layer to reflect the light when the disk is read.

• ACD-RW drive uses three different laser powers.

DVD Technology

• A red-light laser with a wavelength of 635 nm is used instead of the infrared light laser used in CDs, which has a wavelength of 780 nm. The shorter wavelength makes itpossible to focus the light to a smaller spot.

• Pits are smaller, having a minimum length of 0.4 micron.

• Tracks are placed closer together; the distance between tracks is 0.74 micron.

I/O interfacing

• when I/O devices and the memory share the same address space, the arrangement is called memory- mapped I/O

• Some processors have special In and Out instructions to perform I/O transfer

program-controlled I/O

• For an input device such as a keyboard, a status flag, SIN, is included in the interface circuit as part of the status register. This flag is set to 1 when a character is entered at the keyboard and cleared to 0 once this character is read by the processor

• A similar procedure can be used to control output operations using an output status flag, SOUT

Program for I/O (Keyboard and display)

• I/O devices operate at speeds that are vastly different from that of the processor

• processor repeatedly checks a status flag to achieve the required synchronization between the processor and input or output device

• The program enters a wait loop in which it repeatedly tests the device status. During this period, the processor is not performing any useful computation

Interrupt Driven I/O• situations where other tasks can be performed

while waiting for an I/O device to become ready.• arrangement for the I/O device to alert the

processor when it becomes ready• Achieved by sending a hardware signal called an

interrupt to the processor• One of the bus control lines, called an interrupt-

request line, is usually dedicated for this purpose• The routine executed in response to an interrupt

request is called the interrupt-service routine

• Assume interrupt request arrives during execution of instruction i say interrupt for print.

• processor first completes execution of instruction I

• Then, it loads the program counter with the address of the first instruction of the interrupt-service routine

• After execution of the interrupt-service routine, the processor has to come back to instruction i +1

• when an interrupt occurs, the current contents of the PC, which point to instruction i + I, must be put in temporary storage in a known location

• A Return from-interrupt instruction at the end of the interrupt-service routine reloads the PC from that temporary storage location, causing execution to resume at instruction i +1

• In many processors, the return address is saved on theprocessor stack

• As part of handling interrupts, the processor must inform the device that its request has been recognized so that it may remove its interrupt-request signal. This may be accomplished by means of a special control signal on the bus called interrupt-acknowledge signal

• There are many situations in which the processor should ignore interrupt requests for this interupt Enabling flag and disabling flags are used

CommonInterrupt Request Line

Direct MEMORY ACCESS

• To transfer large blocks of data at high speed between an external device and the main memory

• Special control unit to allow transfer of a block of data directly, without continuous intervention by the processor called DMA controller.

• DMA controller performs the functions that would normally be carried out by the processor when accessing the main memory

• it provides the memory address and all the bus signals that control data transfer

• DMA controller can transfer data without intervention by the processor

• Its operation is under the control of a program executed by the processor

• To initiate the transfer of a block of words, the processor sends the starting address, the number of words in the block, and the direction of the transfer

• DMA controller then proceeds to perform the requested operation

• When the entire block has been transferred, the controller informs the processor by raising an interrupt signal

• While a DMA transfer is taking place, the program that requested the transfer cannot continue.

• the processor can be used to execute another program

• After the DMA transfer is completed, the processor can return to the program that requested the transfer.

• The bus consists of three sets of lines used to carry address, data, and control signals. I/O device interfaces are connected to these lines

• bus requires a set of rules called a bus protocol, that govern how the bus is used by various devices

• These rules are implemented by control signals( eg.R/W )

• Master- the device that initiates data transfers by issuing R/W on the bus

• Slave -The device addressed by the master

Synchronous Bus• all devices derive timing information from a

control line called the bus clock

• t0, the master places the device address on the address lines and sends a command on the control lines indicating a Read operation

• The clock pulse width, t1 − t0, must be longer than the maximum propagation delay over the bus

• it must be long enough to allow all devices to decode the address and control signals

• the slave can respond at time t1 by placing the requested input data on the data lines.

• t2, the master loads the data on the data lines into one of its registers

• t2 − t1 must be greater than the maximum propagation time on the bus plus the setup time of the master’s register

Disadvantages

• clock period, t2 − t0, must be chosen to accommodate the longest delays on the bus and the slowest device interface

• forces all devices to operate at the speed of the slowest device.

• the processor has no way of determining whether the addressed device has actually responded at t2

• To overcome these limitations, most buses incorporate control signals that represent a response from the device. These signals inform the master that the slave has recognized its address and that it is ready to participate in a data transfer operation

• cycle 1, the master sends address and command information

• The slave receives this information and decodes it.

• It begins to access the requested data at clock cycle 2.

• The data become ready and are placed on the bus during clock cycle 3

• slave asserts a control signal called Slave-ready at the same time

• The master, which has been waiting for this signal, loads the data into its register at the end of the clock cycle.

• The slave removes its data signals from the bus and returns its Slave-ready signal to the low level at the end of cycle 3.

Asynchronous Bus

• t0—The master places the address and command information

• t1—The master sets the Master-ready line to 1 to inform the devices that the address and command information is ready

• t1 − t0 is intended to allow for any skew that may occur on the bus

• t2—slave places data on the data lines, sets the Slave-ready signal to 1

• t3—The Slave-ready signal arrives at the master

• t4—The master removes the address and command information from the bus

• t5—When the device interface receives the 1-to-0 transition of the Master-ready signal, it removes the data and the Slave-ready signal from the bus

Bus Standards

1.Universal Serial Bus (USB)

• uses point-to-point connections and a serial transmission format

• Provide a simple, low-cost, and easy to use interconnection system

• Accommodate a wide range of I/O devices and bit rates, including Internet connections,

• and audio and video applications• Enhance user convenience through a “plug-

and-play” mode of operation

USB Architecture

• When multiple devices are connected, they are arranged in a tree structure

• node of the tree has a device called a hub, which acts as an intermediate transfer point between the host computer and the I/O devices

• root of the tree, a root hub connects the entire tree to the host computer

• Each device on the USB, whether it is a hub or an I/O device, is assigned a 7-bit address

• This address is local to the USB tree and is not related in any way to the processor’s address space

• The root hub of the USB, which is attached to the processor, appears as a single device.

2.FireWire

• Devices are organized in a daisy chain manner on a FireWire bus, instead of the tree structure of USB.

• FireWire is well suited for connecting audio and video equipment

• I/O devices connected to the USB communicate with the host computer

• The basic FireWire connector has six pins.

3.PCI Bus (Peripheral Component Interconnect)

• low-cost, processor-independent bus• PCI bus is connected to the processor bus via a

controller called a bridge• The bridge translates and relays commands and

responses from one bus to the other and transfers data between them

• I/O devices are connected to the PCI bus, possibly through ports that use standards such as Ethernet, USB, SATA, SCSI, or SAS.

• PCI bus supports three independent address spaces: memory, I/O, and configuration

4.PCI Express

• point-to-point links interconnected via switches to form a tree structure

• root node of the tree, called the Root complex, is connected to the processor bus

• Root complex has a special port to connect the main memory

• Other connections emanating from the Root complex are serial links to I/O devices

• PCI Express link consists of two twisted pairs, one for each direction of transmission

5.SCSI Bus(Small Computer System Interface )

• well-suited for use with disk drives• Consist of 50-wire cable, which can be up to

25 meters in length and can transfer data at rates of up to 5 Megabytes/s

• Data are transferred either 8 bits or 16 bits in parallel

• clock speeds of up to 80 MHz

Computer organization memory hierarchy

Devices & Hardware