Today: Memory Management
• Pages and zones
• Page allocation
• kmalloc, vmalloc
• Slab allocator
• Stack, high memory, per-CPU data structures
Background of Memory Management
• Virtual memory vs. Physical memory
• Address space
• Details will be covered in the following classes• The Process Address Space
• The Page Cache and Page Fault
Virtual Memory
• Virtual memory is a memory management technique
• It creates the illusion to users of a very large memory
• OS (Linux) maps memory addresses used by a program, called virtual addresses, into physical addresses
Virtual Address with a simple user program
• Memory addresses used by a program
#include <stdio.h>
char buf[100];
int main(int argc, char *argv[])
{
int n;
while (n < 10)
n = n + 1;
printf (”[virt_addr] main():%p buf:%p n:%p”,
&main, buf, &n);
}
kernel
stack
heap
bss
text
4GB
3GB
kernel space
user space
0
/* local variables */
argc, argv, n
/* global variables */
buf
/* instruction codes
(read only) */
while (a < 10)
a = i + 1;
printf ( … );
data
x86 (32bit)
Virtual memory (VM) is a layer of indirection (map)
program address
space (4GB) 1GB physical RAM
0
1
2
0
1
……
??
No VM: Crash if we
try to access more
RAM than we have
program address
space (4GB) 1GB physical RAM
0
1
2
0
1
…
…
VM: mapping gives
us flexibility in how
we use the RAM
Map
Disk
Without Virtual MemoryProgram Address = RAM Address (no indirection)
With Virtual MemoryProgram Address Maps to RAM Address
Challenges: #1 not enough physical memory
• Map some of the program’s address space to the disk
• When we need it, we bring it into memory
32-bit program
address space (4GB) 1GB physical RAM
Map
Disk
With Virtual MemoryProgram Address Maps to RAM Address
Program 0
Program 1 Program 0
Program 1
Program 2Program 2
Program 3
Challenges: #1 not enough physical memory
• Map some of the program’s address space to the disk
• When we need it, we bring it into memory
32-bit program
address space (4GB) 1GB physical RAM
Map
Disk
With Virtual MemoryProgram Address Maps to RAM Address
Program 0
Program 1 Program 0
Program 1
Program 2Program 2
Program 3
VM moves oldest
data (0) to disk
Challenges: #1 not enough physical memory
• Map some of the program’s address space to the disk
• When we need it, we bring it into memory
32-bit program
address space (4GB) 1GB physical RAM
Map
Disk
With Virtual MemoryProgram Address Maps to RAM Address
Program 0
Program 1 Program 3
Program 1
Program 2Program 2
Program 3Program 0
Mapping let us use our disk
to give the illusion of
unlimited memory
Challenges: #2 holes in the address space
• How do we use the holes left when programs quit?
• We can map a program’s address to RAM address however we like
4GB physical RAM
With Virtual MemoryProgram Address Maps to RAM Address
Map 2
Map 3
Program 2
2GB
Program 3
2GB
Each program has its
own mapping.
Mapping lets us put our
program data wherever
we want to in the RAM
Challenges: #3 keeping programs secure
• Program 1’s and Program 2’s address map to different RAM addresses
• Because each program has its own address space, they cannot access each other’s data
4GB physical RAM
With Virtual MemoryProgram Address Maps to RAM Address
Map 2
Map 1
Program 2
Program 1
Program 1 Address
4096
Program 2 Address
4096
• Program 1 stores bank balance at address
4096
• VM maps it to RAM address 1
RAM
Address 1
RAM
Address 4
0
1
2
3
4
7
56
• Program 2 stores video game score at
address 4096
• VM maps it to RAM address 4
• Neither can touch the other data
Page frame
• Physical memory is divided into page frames• 4KB sized page frames
pag
e fra
me
pag
e fra
me
pag
e fra
me
pag
e fra
me
pag
e fra
me
pag
e fra
me
pag
e fra
me
pag
e fra
me
pag
e fra
me
Each page frame is
represented by struct page
{…}
to keep track of its status
4KB
Page frame
• Each page frame is represented by struct page• page size is machine-dependent
• 4KB in general
• Defined in include/linux/mm_types.h
struct page {
unsigned long flags; /* page status (permission, dirty, etc.) */
unsigned long counters; /* usage count */
struct address_space *mapping; /* address space mapping */
pgoff_t index; /* offset within mapping */
struct list_head lru; /* LRU list */
void *virtual; /* virtual address */
}
Zones
• Physical memory is divided into a number of blocks called zones
• ZONE_DMA• lower physical memory ranges for old (ISA) DMA devices (16MB)
• ZONE_DMA32• upper physical memory ranges for DMA devices supporting only 32bit
physical address (up to 4GB)
• ZONE_NORMAL• directly mapped into the upper region of the virtual address space
• ZONE_HIGHMEM• not mapped directly by kernel
• page frames should be mapped prior to access
Zones
x86 (32bit) x86_64 (64bit)
ZONE_NORMAL
ZONE_HIGHMEM
0
16MB
896MB
ZONE_DMA
ZONE_DMA32
ZONE_NORMAL
0
16MB
4GB
ZONE_DMA
Why split into Zones
• Certain contexts require certain physical pages due to hardware limitations• Industry Standard Architecture (ISA) card
• It provides 24-bit addressing
• DMA accessing can only be up to 16MB (2^24) due to ISA cards bus limitation
• Physical memory size exceeds kernel’s virtual address space
24-bit address (16MB)
Kernel(1G)
User(3G)
Virtual address space
exceeding
4GB RAM
ZO
NE
_D
MA
ZO
NE
_N
OR
MA
LZ
ON
E_
HIG
HM
EM
• Relationship between virtual and physical memory
Kernel (1G)
0
4GB
3GB
16MB
0
896MB
1:1 direct
mapping(always)
Dynamic
mapping(On demand)
1GB
Memory Layout (x86_32)
kernel image
struct mem_map
User (3G)
the rest memory
for memory
allocation
(kmalloc area)
vmalloc area
and etc.
…
array of
struct page {…}
Memory Fragmentation
• External• various free spaced holes
• Internal• wasted space within each allocated page due to allocation granularity
=> Buddy System
=> Slab Allocator
Buddy System
• Default memory allocator for Linux kernel• Reducing the external fragmentation
• Try to keep the page frames physically continuous as much as possible
• Runs on each zone
• Granularity of page allocation• Allocations are done in power of 2 number of page frames
Buddy
System
page
frame
Buddy System• Basic concepts
• Try to gather physically consecutive pages into a group
• Allocating continuous range of pages
Initial status1)
2)
3)
16 page frames (number of 2^4)
Request 8 pages
Request 2 pages
8 page frames
4 page frames
Low-level memory allocator (Buddy system)
• Low-level mechanisms to allocate memory at the page granularity
• interfaces in include/linux/gfp.h
• APIs for allocating pages• alloc_pages(gfp_t gfp_mask, unsigned int order);
• alloc_page(gfp_t gfp_mask);
• __get_free_pages(gfp_t gfp_mask, unsigned int order);
• __get_free_page(gfp_t gft_mask);
• get_zeroed_page(gfp_t gfp_mask);
Zeroed page allocation
• By default, the page data is not cleared
• May leak information through the page allocation
• To prevent information leakage, allocate a zero-out page for user-space request
unsinged long get_zeroed_page(gfp_t gfp_mask);
Relationships among APIs
• Eventually, all functions performs alloc_pages( ) and __free_pages( )
allocation functions deallocation functions
gfp_t: get free page flags
• Specify options for memory allocation• Action modifier
• How the memory should be allocated
• Zone modifier• From which zone the memory should be allocated
• Type flags• Combination of action and zone modifiers
• Generally preferred compared to the direct use of action/zone
• Defined include/linux/gfp.h
gfp_t: zone modifiers
• If not specified, allocated from ZONE_NORMAL or ZONE_DMA (high preference to ZONE_NORMAL)
Slab allocator
• Basic idea• Caching commonly used objects (such as task_struct, inode, etc.)
rather than allocating/freeing memory
• Reducing internal fragmentation• By caching an object smaller than the page size
• It’s wasteful to allocate a page to store only a few bytes
Slab allocator
• A cache has one or more slabs• One or several physically contiguous
pages
• Slabs contain objects
• A slab may be empty, partially full, or full
• Allocate objects from the partially full slabs to prevent memory fragmentation
slab allocator components for a
certain type of object
(struct my_struct)
Slab allocator coloring
Component of slab allocator Example of slab coloring
cache line
size * 1
cache line
size * 2
• Preventing replacement from CPU cache• By adjusting the starting offset of the objects
Slab allocator variants
• SLOB (Simple List Of Blocks)• Used in early Linux version (from 1991)
• Low memory footprint, suitable for embedded systems
• SLAB• Integrated in 1999
• Cache-friendly
• SLUB• Integrated in 2008
• Improved scalability over SLAB on many cores
Stack
• Each process has• A user-space stack for execution
• A kernel stack for in-kernel execution
• User-space stack is large and grows dynamically
• Kernel-stack is small and has a fixed-size -> two pages (8KB)
• Interrupt stack is for interrupt handlers -> one page for each CPU
• Reduce kernel stack usage to a minimum• Local variables and function parameters
High memory
• On x86_32, physical memory above 896 MB is not permanently mapped within the kernel address space
• Due to the limited size of the address space and the 1/3 GB kernel/user-space memory split
• Before use, pages from highmem should be mapped to the address space
Per-CPU data structure
• Allow each core to have their own values• No locking required
• Reduce cache thrashing
• Implemented through arrays in which each index corresponding to a CPU
References
• Virtual Memory• https://www.youtube.com/watch?v=qlH4-
oHnBb8&list=PLiwt1iVUib9s2Uo5BeYmwkDFUh70fJPxX&index=3
• Professional Linux Kernel Architecture, Wolfgang Mauerer (2.6), Wiley Publishing, Inc.
• Understanding the Linux Virtual Memory Manager, Mel Gorman, PRETICE HALL (2.6)
• LKP class slides by Changwoo Min
• Linux kernel internal lecture slides from BIT ACADEMY by Sungjae Baek and Namhyung Kim