+ All Categories
Home > Documents > DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN...

DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN...

Date post: 17-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
25
DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University of Technology Department of Computer Science, Electrical and Space Engineering
Transcript
Page 1: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

DYNAMIC MEMORY MANAGMENT

IN C++

Martin Sperens

Computer Game Programming, bachelor's level

2019

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

Page 2: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

DYNAMIC MEMORY MANAGMENT IN C++

Martin SperensLuleå University of Technology – Campus Skellefteå

September 24, 2019

I

Page 3: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Acknowledgements

I want to thank my Supervisor Patrik Holmlund for giving me valuable feedback on this report.

-Martin Sperens

II

Page 4: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

ABSTRACT

Memory allocation is an important part of program optimization as well as of com-puter architecture. This thesis examines some of the concepts of memory allocationand tries to implement overrides for the standard new and delete functions in thec++ library using memory pools combined with other techniques. The overrides aretested against the standard new and delete as well as a custom memory pool withperfect size for the allocations. The study finds that the overrides are slightly fasteron a single thread but not on multiple. The study also finds that the biggest gainon performance is to create custom memory pools specific to the programs needs.Lastly, the study also lists a number of ways that the library could be improved.

SAMMANFATTNING

Minnesallokering är en viktig del av optimering av program samt datorarkitektur.Den här rapporten undersöker några av koncepten för minnesallokering och försökerimplementera överskuggningar för de normala new och delete funktionerna i c++biblioteket med hjälp av minnespooler kombinerat med andra tekniker. Överskugg-ningarna är testade mot de normala new och delete funktionerna samt minnespoolersom har perfekt storlek för allokationerna. Studien visar att överskuggningarna ären aning snabbare om en tråd används men inte på flera. Studien visar också attden största vinsten i prestanda fås genom att allokera till egna minnespooler som ärspecifika för programmets behov. Studien räknar också upp flera sätt som biblioteketkan förbättras på.

III

Page 5: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Terms and abbreviation

• RAM - Random Access Memory is also called main memory• Secondary memory - Computer storage memory• OS - Operating System• CPU - The computers processing unit• To benchmark - Test the performance limits of software or hardware• System call - Function handled by the operating system• Context switch - Current process is saved and another process is started up or continued

IV

Page 6: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Contents

1 Background 11.1 Program Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Dynamic Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2.1 Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Memory Fragmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Memory Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3.1 Data Structures For Memory Management . . . . . . . . . . . . . . . . . . 31.3.2 Overriding/Hooking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.3 Guards Bytes and Memory Alignment . . . . . . . . . . . . . . . . . . . . 5

1.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Implementation 62.1 Social, Ethical and Environmental Considerations . . . . . . . . . . . . . . . . . . 62.2 Debugging Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Memory Managment System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4.1 Memory Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4.2 Arena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4.3 Thread Pool Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.4 Huge Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 Memory Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Results 10

4 Discussion 15

5 Further work 165.1 Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.2 Memory Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.3 Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.4 Safety and debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.5 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.6 Further Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

V

Page 7: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

1 Background

Dynamic memory allocation is an important aspect of a program. Bad memory allocation maycause the program to run slower and in some extreme cases, slow it to a grinding stop[1]. Thecauses are because dynamic memory allocations are slower than automatic allocation as well asfragmentation of memory. Many techniques exists which improves memory allocation from thedefault way.This thesis tries to cover the aspects of dynamic memory management and also to create a naiveimplementation of a memory management library which incorporates some of these aspects. Thelibrary was built so it can be extended and improved.

1.1 Program Memory

A program has three kinds memory allocations, static, automatic and dynamic [2]. Static memory isallocated when the program starts and is fixed in size. Automatic and dynamic allocations happenduring program runtime. Automatic allocation is called stack allocation because the memory whichwas allocated last is the first to be freed again. The program handles allocation and freeing ofmemory and does so within a set scope. When the scope ends, the program removes the memoryand that is what makes it automatic.Dynamic memory is called heap allocation because there is no order in which allocated memorymust be freed. With dynamic allocation the program asks the kernel for memory and the kernel hasto make sure there is enough memory to give. Unlike automatic allocation, the memory is not freedafter its scope has ended but the programmer must make sure that it is freed. Dynamic memory isthe bottleneck and is the important part when trying to optimize performance.Dynamic memory allocation is very slow compared to automatic allocation but trying to completelyavoid dynamic memory is often not an option as stack memory is limited in how much memory canbe allocated at one time. Handling larges files therefore requires dynamic memory.The biggest reason to create a custom memory allocator is because of the context switching whichis required when dynamic memory is requested from the OS kernel[3]. Because of this, the mostimportant aspect of memory allocation is to allocate a lot of memory at one time and then allocatespace inside this memory. This greatly reduces allocation and freeing time. Another importantreason is because of memory fragmentation which can cause page and cache misses which makesthe program slower.Memory allocation in c and c++ uses malloc() and free() functions to allocate and free memory[4].The new and delete functions exists in c++ and are wrappers for malloc and free which also trigger aclass’ constructor and destructor functions[5]. To override new and delete is a simple way to createa custom dynamic memory management system.

1.2 Dynamic Memory

Dynamic memory takes up space somewhere in the RAM. This place is called the heap. The heaphas a starting point and an end point which is called the program break. When there is not enoughmemory between start and stop for an allocation, more memory is allocated by simply moving theprogram break forward. This function is called sbrk[6] and is a system call to the OS. As notedearlier, because a context switch is needed to switch to kernel space, this takes time. When removingmemory it is also possible to reverse the end step if nothing is allocated in that space.

1

Page 8: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Figure 1: Program memory

1.2.1 Virtual Memory

RAM is a limited resource. When a program asks for more memory than is available in RAM,the computer can either crash the program or remove memory from RAM. The first option is thesimpler one but can be highly inconvenient as programs can crash unexpectedly without warning.To combat this virtual memory was invented. With virtual memory, when there is no more memoryto allocate, some of the allocated memory is stored in secondary memory. New memory can beallocated and when the stored memory is needed, it is retrieved from secondary storage. This iscalled swapping or paging. Memory is stored in fixed sized chunks called pages and allocatedmemory are stored on these pages. When allocated memory not currently in RAM is retrieved, thekernel finds what page the memory is on and swaps the whole page in. When trying to work onmemory not currently in RAM, it is called a page miss [7].

Figure 2: Image of Mapping virtual to physical memory from iFixMyStuff[8]

Virtual memory makes it possible to work with a lot more memory than what is actually availablein RAM. But this unfortunately has drawbacks. Storage and retrieval between first and secondarymemory takes time and if frequent page misses occur it causes the computers performance todegrade and become very slow. This is called thrashing[1].

2

Page 9: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Because virtual memory offers more memory than available RAM, the memory addresses given toa program is not the actual physical addresses but virtual addresses. It is possible to map the virtualaddresses to the physical ones [7].While most systems use virtual memory in some capacity, some systems don’t use paging, especiallyembedded ones. Most embedded systems use flash memory which only has a finite number ofwrites, which causes paging to run the risk of destroying the systems memory[9].

1.2.2 Memory Fragmentation

A program processes data during its execution, this data has to be sent from RAM to the CPU. Thistransfer of memory is not instant, since the CPU runs at a higher clock speed than the memory.To minimize the impact of transferring data from RAM to CPU, the CPU has its own high speedmemory called cache. When data to be processed is not in the cache, it needs to be retrieved fromRAM. This is called a cache miss. To reduce cache misses, data that are processed sequentially hasto be sequentially stored in memory. However, if this data is not stored in sequence, the number ofcache misses will be very high and will reduce the calculation speed of the CPU. This is refereed toas memory fragmentation. Avoiding memory fragmentation is also important because of virtualmemory and paging.

1.3 Memory Allocation

1.3.1 Data Structures For Memory Management

There are many strategies on how to allocate memory efficiently. Many of them can be combinedfor different tasks. For example, Jemalloc uses the buddy allocation method to carve out big chunksof memory to smaller pieces which must be at least a page(4kb) big. Then these pieces are used bymemory pools [10].

1.3.1.1 Memory Pools and Free Lists

Memory pools have an allocated chunk of memory which is divided into blocks and managed witha free list, which is a data structure to find the next free block. The blocks in the list can be differentsizes but is often a set size. While a free list without set sizes means that allocations of all sizes fitin the same list, it also means a lot of overhead when trying to find a good spot to insert the memoryin. Memory pools used to override the standard allocation function are often divided into pools withdifferent set sizes to avoid memory fragmentation. Some sort of management are then needed toallocate memory from the right sized pool [11] [12].

1.3.1.2 Buddy Allocation

Buddy allocation[13] has a initial big block of memory which can be divided into smaller pieces.When allocating, a small enough memory space for the request is searched for. If no small enoughspace can be found, the memory blocks are divided recursively into two smaller pieces until a rightsized memory block can be used. This means that all memory blocks except the top one have acorresponding block of memory which is their buddy block. When freeing memory, the freed blocksbuddy is examined if is also is free. If the buddy is also free, the blocks merge (or coalescence asit’s often called) into a bigger block. This new big block checks if its buddy is free and the blockscontinue to merge until a buddy is not free or the new block is the initial memory block.

3

Page 10: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

(a) Free list with fixed block size

(b) Free list without fixed sizes. The numbers showing how much space is in the block

Figure 3: Two kinds of free lists

1.3.1.3 Bookkeeping

Bookkeeping is useful to speed up the allocation process. This information can be stored andhandled in different ways. For example jemalloc[14] [15] uses a header which is the first part of apool (or run as they call it). The reason why to have a header is because it is better to have relevantdata packaged closely together to reduce memory fragmentation.

1.3.1.4 Threading

Memory allocation on more than one process may cause cache contests where two processes try toprocess different data that are on the same cache line. This causes one process to wait for the otherone to finish [16]. To solve this, different threads have different memory spaces assigned to them.These memory spaces are often called arenas. If huge objects are allocated(>2-4 Mb) then eacharena could potentially take up a lot of memory. To solve this huge object can be allocated outsideof arenas and kept track of in some other way. For example jemalloc uses a red-black tree to keeptrack of huge objects[10].

1.3.2 Overriding/Hooking

The new and delete functions are easily overridden[5]. Override functions just has to be included ina library. The override functions can be used to implement a custom memory management system.

4

Page 11: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

It is also possible to override malloc/free but it is more complex than for new/delete. By linkinga custom malloc to the program it is possible to override malloc and free. It is also possible tooverride malloc and free by hooking, which is way to intercepts a function or program and insertother code before returning to the function or program[17]. By creating a hook which can interruptthese functions, they can be overridden. Linux has a built in standard hooks for malloc and freewhich can be switched out. Hooking on Windows is more complicated but it is possible to use athird party library like Detours[18] to do it.

1.3.3 Guards Bytes and Memory Alignment

When allocating memory, the system allocates a little more memory than is required. Some of theseextra bytes may be padding to make the memory bit-aligned and some are bytes which have specificnumber values. These values are used for debugging. These values can be checked on deallocation.If the values are not correct then something has gone wrong[19]

1.4 Related Work

A lot has been written about memory allocation. There are several memory libraries that replacesnew and delate as well as malloc and free. Jemalloc and tcmalloc[20] [21] are some of the moreacknowledged ones. Jemalloc is used by Mozilla and Facebook as well as the free operating systemFreeBSD[22] and tcmalloc is used by Google.

5

Page 12: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

2 Implementation

The implementation was chosen to be simple for fast creation and testing as well as for understandingthe project. This was also because of the limited time span as a lot of time was allocated to describethe background of memory allocation.To make things clearer the implemented memory management system is abbreviated to mmsys.Mmsys is not the whole library but only the implementation of the system to override new anddelete.

2.1 Social, Ethical and Environmental Considerations

The danger of the memory library might be if it has bugs or unforeseen security flaws. This couldlead to sensitive information being hacked, like customer passwords and other data. Bugs might alsolead to other damaging behaviour like out of bounds exceptions, which could crash the application,which might hurt customer trust which in turn can lead to reduced revenue. It is therefore importantto test the library thoroughly before using it in real applications.

2.2 Debugging Tools

A program for visualizing was created in tandem by Filip Salén during the development period.The memory library has specific messages which are sent to the visualizer when the library is runin debug mode. The visualizer can measure the amount of memory used as well as fragmentation.Named pipes are used to send the messages for both Linux and Windows.

2.3 Testing

Benchmarking a memory allocator can be complex because the performance of the allocator maydepend on how the program allocates and how the data is structured[10]. There may be edge casesin which an allocator performs badly. Therefore the best way to test an allocator is to try it on realprograms. The important factors is speed and how much memory the program allocates and if thereare any great spikes in memory usage.I have chosen to do simple tests which first allocates and then frees a number of times with differentset sizes. The tests are done with mmsys, a custom pool with perfect size and the regular new/deleteto see how well they perform.

2.4 Memory Managment System

Mmsys is designed so that threads do not use the same memory space to avoid cache line conflicts.Therefore each thread has a different memory space assigned to it.The thread manager is called when new or delete are called. The thread manager finds the threadand memory space approriate for allocating or freeing memory. Firstly, the arena for a thread islocated and secondly, the arena finds the suitable memory pool for allocation or freeing.

2.4.1 Memory Pools

The memory pools took Kenwrights memory pools as a starting point[12]. The good thing abouthis pools was the simple implementation, low access time, low fragmentation as well as the blocksdoes not need to be initialized. The memory pool uses a free list with a set block size which is builtinto the pools memory space. When a block is not in use, the space can instead be used to point tothe next free space in the pool. See figure 4

6

Page 13: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Memory pools allocate memory with malloc through its pool creation function and not throughits constructor. The numbers of blocks a pool has, is declared with the construction function and,the numbers of blocks cannot be changed until the pool is destroyed. Mmsys uses 16384 blocksregardless of block size.To make sure that a pool can not run out of memory, it has a pointer to another pool which begins asNULL. When more memory is needed, a new pool is created with malloc. The new pool has thesame block size and number of blocks as its parent. This creates a chain of memory pools.While mmsys uses memory pools, the library also lets the user create memory pools with their ownset block size and number.

Figure 4: Memory Pool with two used spaces

2.4.2 Arena

To avoid memory fragmentation, it is important to create pools with different sizes. The arenamakes sure an object gets allocated in the right sized pool. Mmsys’ arenas has 19 pool sizes rangingfrom 16 bytes to 4 megabytes. The Arenas have an id for the thread pool manager which can matchthreads to the arenas ids. Just like the memory pools it is possible for the user to create their ownarenas which are not part of mmsys.

Figure 5: Arena with id 1

7

Page 14: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

2.4.3 Thread Pool Manager

The thread pool manager handles calls from the overridden new and delete functions. The threadpool manager has a list of preallocated arenas which can be assigned to threads. To keep track ofwhich arenas currently are in use, a slightly different free list is used. The thread pool manager hasa complimenting list which is the same size as the arena list. Objects in this list keep track of whicharenas are currently in use. When trying to find which arena a thread should allocate or free in, thecomplementing thread object list is iterated through until the threads id is found. If the threads id isnot found, the thread takes the next arena in the free list and the thread id and the new arena id isput into a new thread object at the end of the current list size.When removing a thread, the threads pool manager finds the threads arena and frees its memorypools and then the arena is set as not alive. The last object in the thread object list takes the place ofthe object handling the removed arena. The last places next-variable is set as the current next andthe thread pool managers head-variable is set as the removed arenas index. This way, the threadpool manager always iterates through as many objects as there are current threads. See figure 6.A thread does not automatically remove the memory which it allocated but it can be done fast bycalling the thread pool managers remove thread function at the end of the thread.If an allocation is bigger than the max bucket of a pool manager(4mb), it is instead allocated in alist for huge objects which is not thread specific.

2.4.4 Huge Allocations

Allocations over 4 megabytes are not inserted into a pool. Instead they are put into a Red-BlackBinary Search Tree[23] which is a self balancing tree which guarantees that one part of the treecan only be so much bigger than the other part. The tree had limited amount of elements, with 128pointers by default. An array list was used to contain the pointers and uses indexes to the arrayinstead of pointers, with -1 as the NIL value.

2.5 Memory Lifetime

Because of the way that c++ is constructed, some structures such as filestreams[5] does not endat the end of their scope but after the end of the main programs scope. The filestream will try tohandle the memory which was allocated during the programs runtime and then delete it afterwards.If the program frees the memory before the filestream calls it, the memory is out of bounds and theprogram crashes. To avoid this, the memory must remain until all the delete calls have been made.To make sure the memory is freed after all delete calls have been made, the pool manager checks ifall pools in the pool chain are empty and then frees all memory allocated by the pools.

8

Page 15: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

(a) Before arena 0 is removed

(b) After arena 0 is removed

Figure 6: An arena is removed from thread manager. The arena frees its memory pools and is set asnot alive

Figure 7: Example of a red black tree from Wikipedia [24]

9

Page 16: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

3 Results

The test program allocated objects with a certain size, a set number of times and then deleted all theobjects again. The test was made from a small population were every combined allocation type,size and number was tested three times. Tests were made on a single thread and ten threads. Asdiscussed in next section, the result could vary a lot, especially when using more threads. Thereforeonly the results for the larger allocation sizes are shown when using ten threads.Tests were made on mmsys and the standard new and delete. To see how fast the allocations couldbe optimally, the tests were compared to a memory pool which had all the memory in one poolwhich means it had no overhead when allocating or freeing, making it the fastest way to allocateobjects. To further differentiate the general allocations of mmsys and standard new/delete from theoptimal allocation process, the pools does not call delete for every allocation but simply releases allmemory at one time.Because the number of blocks in mmsys’ memory pools is 16384, the maximum number of poolsin a pool chain was 13 which also means that malloc was at most called 25 times on the singlethreaded tests. One time for every pool except the first one and one time for each memory spacethat the pools hold. On the tests with ten threads malloc was called a maximum of 30 times, one tocreate a pool plus two to create pools, for every thread.

Figure 8: Memory managament system (mmsys)

10

Page 17: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Figure 9: Standard new and delete

Figure 10: Custom pool

11

Page 18: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Figure 11: Memory management system (mmsys). Only smaller sizes. With trendlines

Figure 12: Standard new and delete. Only smaller sizes. With trendlines

12

Page 19: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Figure 13: Custom pool. Only smaller sizes. With trendlines

Figure 14: Mmsys using ten threads

13

Page 20: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

Figure 15: Standard new and delete using ten threads

Figure 16: Standard new and delete using ten threads. Only 10000 bytes

14

Page 21: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

4 Discussion

All of the allocation types had a large variance in performance, as can be seen in figures 8 - 16.Sometimes more allocations gave a faster result than fewer ones. The variance could be quiteextreme, especially when using threads but even on a single process, the slower time could be asmuch as four times larger than the faster ones. The variance has probably something to do with howmalloc finds free space on the OS and how threads are created. For a more accurate result of theallocations methods, more tests could be run and the tests could be done on different OS’s to see ifthe OS’s malloc strategies differ and how this affects the result.While all of the allocation types had quite a large variance, there is a pattern. On single threads,mmsys only performs marginally better than the standard new/delete on the smaller sizes and mmsysonly became faster when larger objects was allocated. A reason why this is so, is because thestandard new and delete makes smaller requests for more memory to avoid that memory runs out.While mmsys is faster using larger requests it also ran out of memory on the last test.On ten threads the standard new/delete is faster than mmsys except for allocations of 100 000 insize. A reason the standard new/delete might be faster than mmsys is that mmsys has more overheadbecause it must go through the thread object list for every allocation and freeing. While standardnew/delete is faster for threads it is also not safe from several processors trying to read the samecache line. To see if mmsys’ way of handling threads is better, tests should be made with the threadssending data to the CPU and see if standard new/delete causes any cache line collisions and if thecost of the collisions is worse than mmsys’ overhead.What is interesting is just how much difference there is in time for standard new/delete whenallocating 10 000 bytes and 100 000 bytes on several threads. When allocating large sizes in threadsthe time grows exponentially. This is worth considering when creating programs with severalthreads.As expected, the custom memory pool is always the fastest. Best performance is reached whenusing memory pools with exactly the amount the program needs. Although this is the fastest method,it is not possible in most cases, to know the exact amount of memory a program needs.A more feasible way might be to use a custom memory pool at sections of the program which arevery memory intensive. Another way of improving memory could be to modify the number ofblocks used by mmsys to reduce number of pools in the pools chains as well as reduce the numberof memory requests.

15

Page 22: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

5 Further work

Mmsys has a lot of features which could be implemented for better results and functionality. Thissection is about the most useful improvements to increase performance and functionality.

5.1 Allocation

The pools uses malloc when more memory is needed. To improve performance, memory should beallocated fewer times and in bigger chunks and stored by the thread pool manager. This memorycould be stored in free lists with page-sized blocks. When a pool asks for memory it is given a spanof these pages from the thread pool manager.When not enough memory exists in these lists, a new list would be created and memory could beallocated for it. The memory of these new lists would depends on the need of the program. If theprogram handles small files, a couple of megabytes might be needed but if the program handles alot of bigger files, hundreds of megabytes could be used.What also could be done is deciding how much memory each pool should initially take. The morememory a pool takes up, the less number of times is needed to go through the pool chain to findwhere to free or allocate in. But if they take up to much memory in one go it might be a waste ofRAM. This is especially true if there are fewer allocations with more variation in size with a lotof threads. As all the memory pools have the same amount of blocks, that means what might be areasonable size for 16 bytes might be way too much for 4 megabytes. Some rules should be addedto make sure a good size is set. This could simply be to decide that all pool initially take up thesame amount of memory. It should also be possible to customize the pool sizes so that each size hasits own standard number of blocks.When allocating from a pool the arena always walks from the first pool to the last in the pool chainto find a free spot. This could be avoided by using something like jemalloc’s bins which points to apool with free memory.

5.2 Memory Alignment

To reduce page misses it is important to allocate memory that are page aligned. The easiest way isto override malloc and use sbrk or mmap to memory that is a multiple of the page size. If defaultmalloc is used, one has to take into account that malloc gives a little more memory than askedbecause it will insert guard bytes, so it might be a good idea to ask malloc for memory that is amultiple of the page size minus a few bytes. It is also important to remember that because mallocsets guard bytes before the allocated memory, the first page is not the full page size but a page minusthe guard bytes. The same goes for the last page.

5.3 Threads

Background threads could be used to find and clean up after completed threads. Hashing could beused to find the right arena for a thread instead of going through all the currents threads.

5.4 Safety and debugging

No guard bytes are included in the allocator. It could help debugging and safety if these wereimplemented and checked when allocating and freeing to see if something has accessed the memoryin an incorrect way.

16

Page 23: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

5.5 Data collection

One very useful feature might be to log how much memory is used in each pool and how manytimes things was allocated and freed in the pools. This could help the user to modify the pool to suitit to the needs of their program.

5.6 Further Testing

Tests should be performed on real programs to find out how well it performs and if there areperformance drops. Tests could be performed on programs which other memory allocators havetested against to measure its performance.

17

Page 24: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

References

[1] Peter J. Denning. Thrashing: Its causes and prevention. AFIPS Conf. Proc., 33:915–922, 011968.

[2] Brian W. Kernighan. The C Programming Language. Prentice Hall Professional TechnicalReference, 2nd edition, 1988.

[3] LINFO. Context switch definition. http://www.linfo.org/context_switch.html. Ac-cessed: 2019-05-25.

[4] IEEE Std. The open group base specifications issue. http://pubs.opengroup.org/onlinepubs/9699919799/functions/contents.html. Accessed: 2019-05-27.

[5] Stanley B. Lippman, Jose Lajoie, and Barbara E. Moo. C++ Primer. Addison-WesleyProfessional, 5th edition, 2012.

[6] The GNU C Library. Process memory concepts. https://www.gnu.org/software/libc/manual/html_node/Memory-Concepts.html. Accessed: 2019-05-27.

[7] Peter J. Denning. Virtual memory. In Encyclopedia of Computer Science, pages 1832–1835.John Wiley and Sons Ltd., Chichester, UK.

[8] iFixMyStuff. How virtual memory works. https://ifixmystuff.com/wp-content/uploads/2018/03/VirtualMemory.png. Accessed: 2019-05-27.

[9] cypress. Endurance and data retention characterization of cypress flash memory. https://www.cypress.com/file/369306/download. Accessed: 2019-05-27.

[10] Jason Evans April. A scalable concurrent malloc(3) implementation for freebsd. 01 2006.[11] Doug Lea. A memory allocator. http://gee.cs.oswego.edu/dl/html/malloc.html.

Accessed: 2019-05-23.[12] Ben Kenwright. Fast efficient fixed-size memory pool: No loops and no overhead. 2012.[13] Donald E. Knuth. The Art of Computer Programming, Volume 1. Addison-Wesley Professional,

1997.[14] jemalloc. http://jemalloc.net/. Accessed: 2019-05-23.[15] Patroklos Argyroudis and Chariton Karamitas. Exploiting the jemalloc memory allocator:

Owning firefox’s heap. Blackhat USA, 2012.[16] Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A

scalable memory allocator for multithreaded applications. SIGOPS Oper. Syst. Rev., 34(5):117–128, November 2000.

[17] Microsoft Developer Network about hooks. https://docs.microsoft.com/sv-se/windows/desktop/winmsg/about-hooks. Accessed: 2019-05-23.

[18] Detours. https://www.microsoft.com/en-us/research/project/detours/?from=http%3A%2F%2Fresearch.microsoft.com%2Fsn%2Fdetours. Accessed: 2019-05-27.

[19] Andrew Suffield. Bounds checking for c and c++. 05 2019.[20] Aliaksey Kandratsenka et al. tcmalloc. https://github.com/gperftools/gperftools,

2019.[21] Sanjay Ghemawat and Paul Menage. Tcmalloc : Thread-caching malloc. http://

goog-perftools.sourceforge.net/doc/tcmalloc.html. Accessed: 2019-05-29.[22] The FreeBSD Foundation. The freebsd project. https://www.freebsd.org/. Accessed:

2019-05-27.[23] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction

to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.

18

Page 25: DYNAMIC MEMORY MANAGMENT IN C++ - DiVA portal1367692/FULLTEXT01.pdf · DYNAMIC MEMORY MANAGMENT IN C++ Martin Sperens Computer Game Programming, bachelor's level 2019 Luleå University

[24] Wikipedia, the free encyclopedia. An example of a red–black tree. https://en.wikipedia.org/wiki/Red%E2%80%93black_tree#/media/File:Red-black_tree_example.svg,2006. accessed May 27, 2019.

19


Recommended