Operating Systems and Computer Networks Memory … V4 Memory... · Alexander Maxeiner, M.Sc....

Operating Systems and Computer Networks

Memory Management I

Prof. Dr.-Ing. Axel HungerAlexander Maxeiner, M.Sc.

Institute of Computer EngineeringFaculty of Engineering

University Duisburg-Essen

Alexander Maxeiner, M.Sc. & Dr.-Ing. Pascal A. Klein

[email protected]

Alexander Maxeiner, M.Sc.University Duisburg-Essen

2Prof. Dr.-Ing. Axel HungerInstitute of Computer Engineering OSCN – Memory Management I

Goals of Memory Management

Memory hierarchy

Cache performance

Cache organization

Memory organization

– Overview

– Algorithms

Agenda



Goals of Memory Management

Convenient abstraction for programming

Allocation of scarce memory resources among competing processes

maximize performance with minimal overhead

Mechanisms

Physical and virtual addressing

Partitioning, Paging, Segmentation

Page replacement algorithms

Memory Management



Memory Hierarchy (1)

Power on

Power onvery short term

Power offshort term

Power offmid term

Power offLong term

Power onImmediate term

small sizesmall capacity


medium sizemedium capacity

medium sizelarge capacity

Large sizevery large capacity


processor registersvery fast, very expensive

processor cachevery fast, very expensive

Random access memoryfast, affordable

hard drivesslow, very cheap

flash / USB memoryslower, cheap

Tape backupvery slow, affordable



Memory Hierarchy (2)

Power on

Power onvery short term

Power offmid term

Power onImmediate term



medium sizemedium capacity


processor registersvery fast, very expensive

processor cachevery fast, very expensive

Random access memoryfast, affordable

hard drivesslow, very cheap



Ahmdahl’s Law:

“The performance improvement of a system to be gained from using faster mode execution is limited by the slowest fraction of a system that can’t be parallelized.”

Problems of Ahmdahl’s Law:

Ahmdahl’s law didn’t include cache memory.

Processors can’t be parallelized indefinitely.

Hyperthreading difficult to include.

Access time improvements



Improvement of a system in numbers:

𝑆𝑂𝑣𝑒𝑟𝑎𝑙𝑙 =𝑡𝑜𝑙𝑑𝑡𝑛𝑒𝑤

1

1 − 𝑃𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑑 +𝑃𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑑

𝑆𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑑

-> The overall speedup of a system is determined by the probability of the functionality of the component and the improvement factor of itself.

Performance improvement



System without cache is improved with a cache.

Cache is 10 times faster than main memory and chance of finding required data in cache is 90%

𝑆𝑃𝐶 =1

1 − 𝑃𝑐𝑎𝑐ℎ𝑒 +𝑃𝑐𝑎𝑐ℎ𝑒

𝑆𝑖𝑚𝑝𝑟𝑜𝑣𝑒𝑚𝑒𝑛𝑡

=1

1 − 0.9 +0.910

= 5.26

Cache example



Example of access times in a modern computer system:

Table shows the big difference in access times of cache and main memory.

Register access depends on clock speed of CPU.

Speed examples

Name: CPU L1 Cache L2 Cache Main Memory I/O Device

Type: Register SRAM SRAM DDR3 RAM Drives

Size: 256 Byte 32 KiB 256 KiB 4 GiB >128 GiB

Access time:

0.28 ns ~1 ns ~3 ns ~40 ns ~5 ms



To calculate the access times of memory the following information needs to be available:

– Access time in case of a hit

– Probability of finding data in memory

– Penalty time in case of a data miss

With all this information access times can be calculated as:

𝑡𝑎𝑐𝑐 = 𝑃ℎ𝑖𝑡 ∗ 𝑡ℎ𝑖𝑡 + 1 − 𝑃ℎ𝑖𝑡 ∗ 𝑡𝑝𝑒𝑛𝑎𝑙𝑡𝑦

Cache performance



Addresses in cache memory ordered by number.

Starting address of a memory block is 0.

Cache organized in lines. Current CPU use 64 Bytes in one cache line. Therefore 6 Bits for offset are needed in addressing.

L1 cache in AMD CPU’s is 32KiB for Instructions & (up to) 64 KiB for Data.

L1 in Intel Hasswel CPU’s is 32KiB for Instructions & 32 KiB for Data.

Both CPU’s are working with 16 Bit addresses.

Higher level caches increasing in size and access times.

Cache addresses



Memory allocation part of the responsibilities of memory management unit.

Needs to be fast and precise.

Addresses of data origin need to be tracked to ensure correct storing of modified data sets.

Allocation strategies different dependent of the type of memory.

L1 Cache small therefor no space for complicated address storage information -> precision & hit chance high priority.

L2 Cache bigger -> Decrease swapping of data, maximum usage of space.

Memory allocation



L1 cache organized in one of 3 ways:

– Fully associative

– Direct mapped

– N-way set associative

All those allocation methods create the target address in the L1 cache out of the originated address.

Backtracking the address of the originated data therefor easy and fast.

If data is modified and swapped out of memory origin is determined and content is replaced.

Cache organization (L1)



Fully associative cache allows the data to be placed anywhere in the cache. Address of origin needs to be stored.

Direct mapped allocation: Data is stored in cache block determined by calculation of:

(𝐵𝑙𝑜𝑐𝑘 𝐴𝑑𝑑𝑟𝑒𝑠𝑠) % (𝑁𝐵𝑙𝑜𝑐𝑘𝑠 𝑖𝑛 𝑐𝑎𝑐ℎ𝑒)

N-Way set associative allocation: Data is stored in cache block determined by calculation of:

(𝐵𝑙𝑜𝑐𝑘 𝐴𝑑𝑑𝑟𝑒𝑠𝑠) % (𝑁𝑆𝑒𝑡𝑠 𝑖𝑛 𝑐𝑎𝑐ℎ𝑒)

N = Number of Cache lines per Set.

Cache mapping



There is a direct connection between the cache mapping method and the hit chance of a data access.

Higher hit chances increase performance of overall system.

Mapping and hit chance



L2 Caches and higher level memory use different allocation methods.

Size of data can vary, therefore algorithms need to be in place that find free space fast.

Due to constant swapping of data external fragmentation common problem of cache and memory.

Deleting data only acceptable if unused or no free area available for process critical data.

The more data stored in a cache system, the faster tasks can be executed. Since cache is limited in size only most important data can be stored with requirement of maximum data efficiency.

Memory allocation



Assuming a L2 Cache of 0.5 MiB.

Cache partially used by other tasks.

New request for data arrives at the memory allocation unit of CPU.

Behavior of allocation unit dependent on underlying algorithm.

Example of cache meory

Free Space:120 KiB

OS: 132 KiB

Process B:100 KiB

Process D:60 KiB

Free Space:60 KiB

Free Space:40 KiB



Assuming a new process E requests 30 KiB

Which area should it use?

Available algorithms

First fit

Next fit

Best fit

Worst fit

Algorithms to allocate memory

Free Space:120 KiB

OS: 132 KiB

Process B:100 KiB

Process D:60 KiB

Free Space:60 KiB

Free Space:40 KiB



simplest algorithm

scanning along the list (from beginning) until it finds sufficient area

Breaking area in two pieces:

• One for process

• One for unused memory (new area)

Memory Allocation: First Fit

Free Space:120 KiB

OS: 132 KiB

Process B:100 KiB

Process D:60 KiB

Free Space:60 KiB

Free Space:40 KiB



simplest algorithm

scanning along the list (from beginning) until it finds sufficient area

Breaking area in two pieces:

– One for process

– One for unused memory (new area)

Fitting into 60KiB hole

Very fast – searches as little as possible

Memory Allocation: First Fit

Free Space:120 KiB

OS: 132 KiB

Process B:100 KiB

Process D:60 KiB

Free: 30 KiB

Free Space:40 KiB

Process E:30KiB

start

stop



Variation of First Fit

Keeps track of where it found a area

Next time: start at position where it left off last time

Assumption that prospective fitting areas come after already found holes

However, slight worse performance than first fit

Memory Allocation: Next Fit

Free Space:120 KiB

OS: 132 KiB

Process B:100 KiB

Process D:60 KiB

Free Space:60 KiB

Free Space:40 KiB








Fitting into 60KiB hole


start

stop

Free Space:120 KiB

OS: 132 KiB

Process B:100 KiB

Process D:60 KiB

Free: 30 KiB

Free Space:40 KiB

Process E:30KiB








Fitting into 60 KiB hole

Then, fitting into 40 KiB hole


Process B:100 KiB

Free: 10 KiB

start

stop

Process F:30 KiB

OS: 132 KiB

Free: 30 KiB

Process E:30KiB

Free Space:120 KiB

Process D:60 KiB



Searches entire list (from beginning to end)

Takes smallest free space

Assumption of best memory spatial performance

Memory Allocation: Best Fit

Free Space:120 KiB

OS: 132 KiB

Process B:100 KiB

Process D:60 KiB

Free Space:60 KiB

Free Space:40 KiB



Searches entire list (from beginning to end)

Takes smallest free space

Assumption of best memory spatial performance

Fitting into 40K hole

Takes much CPU time (for searching)

Resulting in more memory wasting because of numerous tiny useless free areas.

Memory Allocation: Best Fit

Free: 10 KiBProcess E:30KiB

start

stop

OS: 132 KiB

Process B:100 KiB

Free Space:60 KiB

Free Space:120 KiB

Process D:60 KiB



Variation of best fit

Always takes the largest hole

Assumption to get around splitting into tiny holes, to be big enough for other processes

Memory Allocation: Worst Fit

Free Space:120 KiB

OS: 132 KiB

Process B:100 KiB

Process D:60 KiB

Free Space:60 KiB

Free Space:40 KiB



Variation of best fit

Always takes the largest hole

Assumption to get around splitting into tiny holes, to be big enough for other processes

Fitting into 120 KiB hole

Takes much CPU time (for searching)

Still memory wasting

Memory Allocation: Worst Fit

Free Space:90 KiB

Process E:30KiB

start

stop

OS: 132 KiB

Process B:100 KiB

Free Space:60 KiB

Free Space:40 KiB

Process D:60 KiB



Maintaining separate lists for processes AND holes

• Algorithms devote energy to inspect holes NOT processes

• BUT higher effort for deallocating (changing both lists)

Sorting of lists regarding size

enhancing speed of Best Fit/Worst Fit

In practice:

FF is usually better

NF is pointless when sorted lists are used

Worst fit at its worst if allocated memory can‘t be reorganized easily

Memory Allocation:

Enhancing performance



Cache organized with the Buddy-system usually fast and memory efficient.

Memory encapsulated in units of powers of 2

Process requests rounded up to fit into units

Possible hole sizes: ..., 4K, 8K, 16K, 32K, 64K, 128K, ...

Buddy System

64 KiB

32 KiB

16 KiB

8 KiB4 KiB4 KiB



Example: Suppose a memory of 128 KiB One hole of 128 KiB

Process A requests 6 KiB rounded to 8 KiB (2 KiB wasted)

Buddy System - Splitting (1)

128 KiB



128K

Example: Suppose a memory of 128 KiB One hole of 128 KiB

Process A requests 6 KiB rounded to 8 KiB (2 KiB wasted)

Next process repeats the algorithm and is stored in first fitting memory unit.

If no memory unit is capable of storing the data, other data needs to be deleted.

Buddy System - Splitting (2)

64 KiB

32 KiB

16 KiB

8 KiB

A: 8 KiB



Example: Suppose a memory of 128 KiB.

Memory filled as described.

New data request for 52 KiB of data arrives. No free memory block fits.

Data is now deleted according to underlying swapping algorithm.

Buddy System - Merging (1)

D: 32 KiB

B: 32 KiB

8 KiB

A: 8 KiB

8 KiB

C: 4 KiB4 KiB

32 KiB



If data is deleted adjacent blocks merge.

Blocks can only merge to ‚restore‘ original size from where they split.

Buddy System – Merging (2)

D: 32 KiB

B: 32 KiB

8 KiB

A: 8 KiB

8 KiB

C: 4 KiB4 KiB

32 KiB





Deleting Data A results in a merger of both adjacent memory units, since they got split up in the allocation process.


D: 32 KiB

B: 32 KiB

8 KiB

C: 4 KiB4 KiB

32 KiB

A: 8 KiB

8 KiB16 KiB





Deleting Data B will not result in a merger, since the neighboring 32 KiB Block it needs to merge with is still occupied


D: 32 KiB

B: 32K

8 KiB

A: 8 KiB

8 KiB

C: 4 KiB4 KiB

32 KiB

32 KiB



Memory allocation method dependent on requirements of hardware.

In PC cache memory n-way set allocation used for optimal usage, hit chance and reduced swapping.

Embedded systems with small caches and less data to handle might use direct mapped cache.

First fit allocation method usually fast.

Worst fit / Best Fit more space efficient depending on data.

“Y”-Fit algorithms less problems with internal fragmentation.

Buddy system fast and efficient, but has problems with internal fragmentation

Conclusion



Questions?



Tanenbaum, Andrew S., “Modern Operating Systems”, 3rd edition, Pearson Education Inc, Amsterdam, Netherlands, 2008.

Tanenbaum, Andrew S., “Moderne Bertriebssysteme”, 3rd edition, Pearson Education Inc, Amsterdam, Netherlands, 2009.

Lee, Insup, „CsE 380 Computer Operating Systems“, Lecture Notes, University of Pennsylvania, 2002.

Snoeren, Alex C., „Lecture 10: Memory Management“, LectureNotes, UC San Diego, 2010.

Resources

Date post:	29-Aug-2019
Category:	Documents
Upload:	ngoduong
View:	223 times
Download:	1 times

Operating Systems and Computer Networks Memory … V4 Memory... · Alexander Maxeiner, M.Sc....

Documents