where did the memory go? - c++

class Node
{
//some member variables.
};
std::cout<<"size of the class is "<<sizeof(Node)<<"\n";
int pm1 =peakmemory();
std::cout<<"Peak memory before loop is "<< pm1<<"\n";
for(i=0; i< nNode; ++i)
{
Node * p = new Node;
}
int pm2 =peakmemory();
std::cout<<"Peak memory after loop is "<< pm2<<"\n";
I thought pm2-pm1 approximates nNode * sizeof(Node). But it turns out pm2-pm1 is much larger than nNode *sizeof(Node). Where did the memory go? I suspect sizeof(Node) does not reflect the correct memory usage.
I have tested on both Windows and linux. Final conclusion is Node * p = new Node; will allocate a memory larger than sizeof(Node) where Node is a class.

Since you haven't specified what platform you're running on, here are a few possibilities:
Allocation size: Your C++ implementation may be allocating memory in units which are larger than sizeof(Node), e.g. to limit the amount of book-keeping it does.
Alignment: The allocator may have a policy of returning addresses aligned to some minimum power of 2. Again, this may simplify its implementation somewhat.
Overhead: Some allocators, in addition to the memory you are using, have some padding with a fixed pattern to protect against memory corruption; or some meta-data used by the allocator.
That is not to say this actually happens. But it could; it certainly agrees with the language specification (AFAICT).

Pretty much all memory allocators (such as those on linux/win32) have an allocation header that proceeds the memory allocation (which includes info about the size of the allocation). On linux for example, you can look at the source for malloc, and that gives info about the stored header (and how to compute its size):
https://git.busybox.net/uClibc/tree/libc/stdlib/malloc/malloc.h#n106
As already mentioned in a comment, debug builds may also allocate additional bytes at either side of the allocation to guard against buffer overrun errors.

Related

C++ std::make_unique usage

This is the first time I am trying to use std::unique_ptr but I am getting an access violation
when using std::make_unique with large size .
what is the difference in this case and is it possible to catch this type of exceptions in c++ ?
void SmartPointerfunction(std::unique_ptr<int>&Mem, int Size)
{
try
{
/*declare smart pointer */
//Mem = std::unique_ptr<int>(new int[Size]); // using new (No crash)
Mem = std::make_unique<int>(Size); // using make_unique (crash when Size = 10000!!)
/*set values*/
for (int k = 0; k < Size; k++)
{
Mem.get()[k] = k;
}
}
catch(std::exception& e)
{
std::cout << "Exception :" << e.what() << std::endl;
}
}
When you invoke std::make_unique<int>(Size), what you actually did is allocate a memory of size sizeof(int) (commonly 4bytes), and initialize it as a int variable with the number of Size. So the size of the memory you allocated is only a single int, Mem.get()[k] will touch the address which out of boundary.
But out of bounds doesn't mean your program crash immediately. As you may know, the memory address we touch in our program is virtual memory. And let's see the layout of virtual memory addresses.
You can see the memory addresses are divided into several segments (stack, heap, bss, etc). When we request a dynamic memory, the returned address will usually located in heap segment (I use usually because sometimes allocator will use mmap thus the address will located at a memory shared area, which is located between stack and heap but not marked on the diagram).
The dynamic memory we obtained are not contiguous, but heap is a contiguous segment. from the OS's point of view, any access to the heap segment is legal. And this is what the allocator exactly doing. Allocator manages the heap, divides the heap into different blocks. These blocks, some of which are marked "used" and some of which are marked "free". When we request a dynamic memory, the allocator looks for a free block that can hold the size we need, (split it to a small new block if this free block is much larger than we need), marks it as used, and returns its address. If such a free block cannot be found, the allocator will call sbrk to increase the heap.
Even if we access address which out of range, as long as it is within the heap, the OS will regard it as a legal operation. Although it might overwrite data in some used blocks, or write data into a free block. But if the address we try to access is out of the heap, for example, an address greater than program break or an address located in the bss. The OS will regard it as a "segment fault" and crash immediately.
So your program crashing is nothing to do with the parameter of std::make_unique<int>. It just so happens that when you specify 1000, the addresses you access are out of the segment.
std::make_unique<int>(Size);
This doesn't do what you are expecting!
It creates single int and initializes it into value Size!
I'm pretty sure your plan was to do:
auto p = std::make_unique<int[]>(Size)
Note extra brackets. Also not that result type is different. It is not std::unique_ptr<int>, but std::unique_ptr<int[]> and for this type operator[] is provided!
Fixed version, but IMO you should use std::vector.

Why does my code occasionally show memory on the free store (heap) growing both up and down? (C++)

My understanding is that memory allocated on the free store (the heap) should grow upwards as I allocate additional free store memory; however, when I run my code, occasionally the memory location of the next object allocated on the free store will be a lower value. Is there an error with my code, or could someone please explain how this could occur? Thank you!
int main()
{
int* a = new int(1);
int* b = new int(1);
int* c = new int(1);
int* d = new int(1);
cout << "Free Store Order: " << int(a) << " " << int(b) << " " << int(c) << " " << int(d) << '\n';
// An order I found: 13011104, 12998464, 12998512, 12994240
delete a;
delete b;
delete c;
delete d;
return 0;
}
The main problem with that code is that you are casting int * to int, an operation that may lose precision, and therefore give you incorrect results.
But, aside from that, this statement is a misapprehansion:
My understanding is that memory allocated on the free store (the heap) should grow upwards as I allocate additional free store memory.
There is no guarantee that new will return objects with sequential addresses, even if they're the same size and there have been no previous allocations. A simple allocator may well do that but it is totally free to allocate objects in any manner it wants.
For example, it may allocate in a round robin method from multiple arenas to reduce resource contention. I believe the jemalloc implementation does this (see here), albeit on an per-thread basis.
Or maybe it has three fixed-address 128-byte buffers to hand out for small allocations so that it doesn't have to fiddle about with memory arenas in programs with small and short-lived buffers. That means the first three will be specific addresses outside the arena, while the fourth is "properly" allocated from the arena.
Yes, I know that may seem a contrived situation but I've actually done something similar in an embedded system where, for the vast majority of allocations, there were less than 64 128-byte allocations in flight at any given time.
Using that method means that most allocations were blindingly fast, using a count and bitmap to figure out free space in the fixed buffers, while still being able to handle larger needs (> 128 bytes) and overflows (> 64 allocations).
And deallocations simply detected if you were freeing one of the fixed blocks and marked it free, rather than having to return it to the arena and possibly coalesce it with adjacent free memory sections.
In other words, something like (with suitable locking to prevent contention, of course):
def free(address):
if address is one of the fixed buffers:
set free bit for that buffer to true
return
call realFree(address)
def alloc(size):
if size is greater than 128 or fixed buffer free count is zero:
return realAlloc(size)
find first free fixed buffer
decrement fixed buffer free count
set free bit for that buffer to false
return address of that buffer
The bottom line is that the values returned by new have certain guarantees but ordering is not one of them.

dynamic memory allocation using new with binary search in C++

I am trying to find the maximum memory allocated using new[]. I have used binary search to make allocation a bit faster, in order to find the final memory that can be allocated
bool allocated = false;
int* ptr= nullptr;
int low = 0,high = std::numeric_limits<int>;
while(true)
{
try
{
mid = (low + high) / 2;
ptr = new int[mid];
delete[] ptr;
allocated = true;
}
catch(Exception e)
{....}
if (allocated == true)
{
low = mid;
}else
{
high = low;
cout << "maximum memory allocated at: " << ptr << endl;
}
}
I have modified my code, I am using a new logic to solve this. My problem right now is it is going to a never ending loop. Is there any better way to do this?
This code is useless for a couple of reasons.
Depending on your OS, the memory may or may not be allocated until it is actually accessed. That is, new happily returns a new memory address, but it doesn't make the memory available just yet. It is actually allocated later when and if a corresponding address is accessed. Google up "lazy allocation". If the out-of-memory condition is detected at use time rather than at allocation time, allocation itself may never throw an exception.
If you have a machine with more than 2 gigabytes available, and your int is 32 bits, alloc will eventually overflow and become negative before the memory is exhausted. Then you may get a bad_alloc. Use size_t for all things that are sizes.
Assuming you are doing ++alloc and not ++allocation, it shouldn't matter what address it uses. if you want it to use a different address every time then don't delete the pointer.
This is a particularly bad test.
For the first part you have undefined behaviour. That's because you should only ever delete[] the pointer returned to you by new[]. You need to delete[] pvalue, not value.
The second thing is that your approach will be defragmenting your memory as you're continuously allocating and deallocating contiguous memory blocks. I imagine that your program will understate the maximum block size due to this fragmentation effect. One solution to this would be to launch instances of your program as a new process from the command line, setting the allocation block size as a parameter. Use a divide and conquer bisection approach to attain the maximum size (with some reliability) in log(n) trials.

C++/ActiveX replacing realloc with malloc, memcpy, free. Functional and Performance tests

I've been assigned to a project that is a complex legacy system written in C++ and ActiveX ~ 10 years old.
The setup is Microsoft Visual Studio 2008.
Whilst there are no issues with the system right now, as part of the security review of the legacy system, an automated security code scanning tool has marked instances of realloc as Bad Practice issue, due to security vulnerability.
This is because realloc function might leave a copy of sensitive information stranded in memory where it cannot be overwritten. The tool recommends replacing realloc with malloc, memcpy and free.
Now realloc function being versatile, will allocate memory when the source buffer is null. It also frees memory when the size of the buffer is 0. I was able to verify both these scenarios.
Source: MDSN Library 2001
realloc returns a void pointer to the reallocated (and possibly moved) memory block. The return value is NULL if the size is zero and the buffer argument is not NULL, or if there is not enough available memory to expand the block to the given size. In the first case, the original block is freed. In the second, the original block is unchanged. The return value points to a storage space that is guaranteed to be suitably aligned for storage of any type of object. To get a pointer to a type other than void, use a type cast on the return value.
So, my replacement function that uses malloc, memcpy and free has to cater for these cases.
I have reproduced below the original code snippet (an array implementation) that uses realloc to dynamically resize and shrink its internal buffer.
First the class definition:
template <class ITEM>
class CArray
{
// Data members:
protected:
ITEM *pList;
int iAllocUnit;
int iAlloc;
int iCount;
public:
CArray() : iAllocUnit(30), iAlloc(0), iCount(0), pList(NULL)
{
}
virtual ~CArray()
{
Clear(); //Invokes SetCount(0) which destructs objects and then calls ReAlloc
}
The existing ReAlloc method:
void ReAllocOld()
{
int iOldAlloc = iAlloc;
// work out new size
if (iCount == 0)
iAlloc = 0;
else
iAlloc = ((int)((float)iCount / (float)iAllocUnit) + 1) * iAllocUnit;
// reallocate
if (iOldAlloc != iAlloc)
{
pList = (ITEM *)realloc(pList, sizeof(ITEM) * iAlloc);
}
}
The following is my implementation that replaces these with malloc,memcpy and free:
void ReAllocNew()
{
int iOldAlloc = iAlloc;
// work out new size
if (iCount == 0)
iAlloc = 0;
else
iAlloc = ((int)((float)iCount / (float)iAllocUnit) + 1) * iAllocUnit;
// reallocate
if (iOldAlloc != iAlloc)
{
size_t iAllocSize = sizeof(ITEM) * iAlloc;
if(iAllocSize == 0)
{
free(pList); /* Free original buffer and return */
}
else
{
ITEM *tempList = (ITEM *) malloc(iAllocSize); /* Allocate temporary buffer */
if (tempList == NULL) /* Memory allocation failed, throw error */
{
free(pList);
ATLTRACE(_T("(CArray: Memory could not allocated. malloc failed.) "));
throw CAtlException(E_OUTOFMEMORY);
}
if(pList == NULL) /* This is the first request to allocate memory to pList */
{
pList = tempList; /* assign newly allocated buffer to pList and return */
}
else
{
size_t iOldAllocSize = sizeof(ITEM) * iOldAlloc; /* Allocation size before this request */
size_t iMemCpySize = min(iOldAllocSize, iAllocSize); /* Allocation size for current request */
if(iMemCpySize > 0)
{
/* MemCpy only upto the smaller of the sizes, since this could be request to shrink or grow */
/* If this is a request to grow, copying iAllocSize will result in an access violation */
/* If this is a request to shrink, copying iOldAllocSize will result in an access violation */
memcpy(tempList, pList, iMemCpySize); /* MemCpy returns tempList as return value, thus can be omitted */
free(pList); /* Free old buffer */
pList = tempList; /* Assign newly allocated buffer and return */
}
}
}
}
}
Notes:
Objects are constructed and destructed correctly in both the old and new code.
No memory leaks detected (as reported by Visual Studio built in CRT Debug heap functions: http://msdn.microsoft.com/en-us/library/e5ewb1h3(v=vs.90).aspx)
I wrote a small test harness (console app) that does the following:
a. Add 500000 instances of class containing 2 integers and an STL string.
Integers added are running counter and its string representations like so:
for(int i = 0; i < cItemsToAdd; i++)
{
ostringstream str;
str << "x=" << 1+i << "\ty=" << cItemsToAdd-i << endl;
TestArray value(1+i, cItemsToAdd-i, str.str());
array.Append(&value);
}
b. Open a big log file containing 86526 lines of varying lengths, adding to an instance of this array: CArray of CStrings and CArray of strings.
I ran the test harness with the existing method (baseline) and my modified method. I ran it in both debug and release builds.
The following are the results:
Test-1: Debug build -> Adding class with int,int,string, 100000 instances:
Original implementation: 5 seconds, Modified implementation: 12 seconds
Test-2: Debug build -> Adding class with int,int,string, 500000 instances:
Original implementation: 71 seconds, Modified implementation: 332 seconds
Test-3: Release build -> Adding class with int,int,string, 100000 instances:
Original implementation: 2 seconds, Modified implementation: 7 seconds
Test-4: Release build -> Adding class with int,int,string, 500000 instances:
Original implementation: 54 seconds, Modified implementation: 183 seconds
Reading big log file into CArray of CString objects:
Test-5: Debug build -> Read big log file with 86527 lines CArray of CString
Original implementation: 5 seconds, Modified implementation: 5 seconds
Test-6: Release build -> Read big log file with 86527 lines CArray of CString
Original implementation: 5 seconds, Modified implementation: 5 seconds
Reading big log file into CArray of string objects:
Test-7: Debug build -> Read big log file with 86527 lines CArray of string
Original implementation: 12 seconds, Modified implementation: 16 seconds
Test-8: Release build -> Read big log file with 86527 lines CArray of string
Original implementation: 9 seconds, Modified implementation: 13 seconds
Questions:
As you can see from the above tests, realloc is consistently faster compared to memalloc, memcpy and free. In some instances (Test-2 for eg) its faster by a whopping 367%. Similarly for Test-4 it is 234%. So what can I do to get these numbers down that is comparable to realloc implementation?
Can my version be made more efficient?
Assumptions:
Please note that I cannot use C++ new and delete. I have to use only malloc and free. I also cannot change any of the other methods (as it is existing functionality) and impacts are huge. So my hands are tied to get the best implementation of realloc that I possibly can.
I have verified that my modified implementation is functionally correct.
PS: This is my first SO post. I have tried to be as detailed as possible. Suggestions on posting is also appreciated.
First of all I'd like to point out you are not addressing the vulnerability as the memory released by free is not being cleared as well, same as realloc.
Also note your code does more than the old realloc: it throws an exception when out of memory. Which may be futile.
Why is your code slower than realloc? Probably because realloc is using under the hood shortcuts which are not available to you. For example realloc may be allocating more memory than you actually request, or allocating contiguous memory just after the end of the previous block, so your code is doing more memcpy's than realloc.
Point in case. Running the following code in CompileOnline gives result Wow no copy
#include <iostream>
#include <stdlib.h>
using namespace std;
int main()
{
void* p = realloc(NULL, 1024);
void* t = realloc(p, 2048);
if (p == t)
cout << "Wow no copy" << endl;
else
cout << "Alas, a copy" << endl;
return 0;
}
What can you do to make your code faster?
You can try to allocate more memory after the currently allocated block, but then freeing the memory becomes more problematic as you need to remember all the pointers you allocated, or find a way to modify the lookup tables used by free to free the correct amount of memory on one go.
OR
Use the common strategy of (internally) allocating twice as much memory as you previously allocated and (optionally) shrink the memory only when the new threshold is less than half the allocated memory.
This gives you some head room so not every time memory grows is it necessary to call malloc/memcpy/free.
If you look at an implementation of realloc e.g.
http://www.scs.stanford.edu/histar/src/pkg/uclibc/libc/stdlib/malloc/realloc.c
you see that the difference between your implementation and an existing one
is that it expands the memory heap block instead of creating a whole new block
by using low-level calls. This probably accounts for some of the speed difference.
I think you also need to consider the implications of memset of memory every time you do a realloc because then a performance degradation seems inevitable.
I find the argument about realloc leaving code in the memory is somewhat overly paranoid because the same can be said about normal malloc/calloc/free. It would mean that you would not only need to find all reallocs/malloc/callocs but also any runtime or 3rd party function that internally uses those functions to be really sure that nothing is kept in memory alternatively another way would be to create your own heap and replace it with the regular one to keep it clean.
Conceptually realloc() is not doing anything too smart - it allocates memory by some blocks exactly as you do in your ReAllocNew.
The only conceptual difference can be in the way how new block size is calculated.
realloc may use something like this:
int new_buffer_size = old_buffer_size * 2;
and this will decrease number of memory moves from what you have there.
In any case I think that block size calculation formula is the key factor.

Writing my own memory manager class, overriding new and delete operators

I was given the assignment of making my own memory manager class, but I really have no idea where to start. My instructions are;
//1> write a memman allocation function
//2> insure the alloce functions returns unused addresses
//3> once all memman memmory is used up, subsequent alloces return NULL
//4> enable freeing of memory and subsequent reuse of those free'd regions
I've tried searching around for any guides on dealing with memory allocation, but I have not been too successful.
Here is one very, very naive idea to get you started:
char arena[1000000];
char * current = arena;
void * memman(std::size_t n)
{
char * p = current;
current += 16 * ((n + 15) / 16); // or whatever your alignment
return p;
}
All the memory is statically allocated, so you don't need any library calls to get your initial chunk of memory. We make sure to return only pointers with maximal alignment (hardcoded to 16 here, though this should be a constant like sizeof(std::maxalign_t)). This version doesn't allow for any reclamation, and it's missing the overflow checks.
For reclamation, you could try and write a free list.
As a slight variation, you could make your array be an array of maxalign_ts, which would simplify the stepping logic a bit. Or you could make it an array of uintptr_t and use the memory itself as the free list.