I don't understand about memory issue of appending string - c++

Runtime error: pointer index expression with base 0x000000000000 overflowed to 0xffffffffffffffff for frequency sort
In first answer of that link, it says that appending char to string can cause memory issue.
string s = "";
char c = 'a';
int max = INT_MAX;
for(int j=0;j<max;j++)
s = s + c;
The answer explains [s=s+c in above code copies the same string again and again so it will cause memory issue.] But I don't understand why that code copies the same string again and again.
Is there someone who is likely to make me understand that part :)?

I don't understand why that code copies the same string again and
again.
Okay, let's look at the what happens each time the loop is iterated:
s = s + c;
There are three things the program has to do in order to execute that line of code:
Compute the temporary value s + c -- to do that, the program has to create a temporary, anonymous std::string object, and allocate for it (from the heap) an internal byte-buffer that is at least one byte larger than the number of chars currently in s (so that it can hold all of s's old contents, plus the additional char provided by c)
Set s equal to the temporary-string. In C++03 and earlier, this would be done by reallocating s's internal byte-buffer to be larger, then copying all of the bytes from the temporary-string into s's new/larger buffer. C++11 optimizes this a bit via the new move-assignment operator, so that all the bytes don't have to be copied; rather, s can simply take ownership of the temporary-string's byte-buffer.
Free the temporary string's resources, now that we're done using it. In practice, this takes the form of the std::string class's destructor calling delete[] on the old (no-longer-large-enough) byte-buffer.
Given that the above is going to be performed at least 2 billion times in a loop, it's already quite inefficient.
However, what I think the answer you referred to was particularly concerned about was heap fragmentation. Keep in mind that heap allocation doesn't work by magic; when you (or the std::string class, or anyone) asks to allocate N bytes of memory from the heap, the heap implementation's job is to find N bytes of contiguous memory and return it. And since there is no provision in C++ for moving blocks of memory around (as doing so would invalidate any pointers that the program might have pointing into those blocks of memory), the heap can't create an N-byte contiguous-memory-chunk out of smaller chunks; instead, there has to be a range of contiguous-memory-space already available. For example, it does the heap no good to have a total of 1GB of memory available, if that 1GB of memory is made up of thousands of nonconsecutive 1KB chunks and the caller is asking for a 2KB allocation.
Therefore, the heap's job is to efficiently allocate chunks of memory of the sizes the program requests, and when they are freed again, it will try to glue them back together into larger chunks again if it can, but it may not always be able to. Certain patterns of allocating and freeing memory may result in heap fragmentation, which is simply a large number of discontinuous memory allocations that render the small regions of free memory between them unusable for large allocations.
Whether or not this particular allocate/free pattern would cause that, I'm not sure; given that only one or two buffers are being allocated at a time, the heap may be able to reabsorb them back into adjacent free-memory chunks as they get freed again -- it probably depends on the particular heap algorithm the system is using, as well as on whether any other threads are allocating/freeing heap memory while this is going on. But I wouldn't be too surprised if there are systems out there where it would cause problems (particularly on 16-bit or 32-bit systems where virtual address space is limited, or embedded systems that don't use virtual memory)

Related

C++ allocating array non contiguously

Let's consider this C++ code as a rough example.
int *A = new int [5];
int *B = new int [5];
int *C = new int [5];
delete []A;
delete []C;
int *D = new int [10];
Obviously any machine can handle this case without any problems with buffer overflow or memory leak. However let's imagine that the lengths are multiplied by one million or even a bigger number. As far as I know addresses (at least virtual addresses) of all array elements are consecutive. So whenever I create an array I can be sure that they are contiguous chunks in virtual memory and I can perform pointer arithmetic to access n-th element if I have pointer to the first one. My question is illustrated in the following image( registers representing end of array are ignored for the sake of simplicity).
After allocating A, B, C in the heap we free A and C and get two free memory chunks of length 5 (marked with green dots). What happens when I want to allocate an array of length 10? I think that there are 3 possible cases.
I will get bad_alloc exception for not having contiguous 10 length memory chunk.
The program will automatically reallocate array B to the beginning of the heap and join together the rest of the unused memory.
The array D will be split into 2 parts and stored not contiguously causing not constant access time for n-th element of the array (if there are much more than 2 splits it starts to resemble a linked list rather than an array).
Which one of these is most possible answer or is there another possible case I didn't take into account?
I will get bad_alloc exception for not having contiguous 10 length memory chunk.
This can happen.
The program will automatically reallocate array B to the beginning of the heap and join together the rest of the unused memory.
This cannot happen. Moving an object to a different address is not possible in C++ because it would invalidate existing pointers.
The array D will be split into 2 parts and stored not contiguously causing not constant access time for n-th element of the array (if there are much more than 2 splits it starts to resemble a linked list rather than an array).
This also cannot happen. In C++ array elements are stored contiguously, so that pointer arithmetic is possible.
But there are in fact more possibilities. To understand them, we must account for the fact that the memory can be virtual. This, among other things, means that available address space may be larger than the amount of physically present memory. A chunk of physical memory can be assigned any address from the available address space.
As an example, consider a machine with 8GB (2^33 bytes) of memory running a 64-bit os on a 64-bit CPU. Addresses allocated to the program do not gave all be less that 8GB; it can receive a megabyte chunk of memory at address 0x00000000ffff0000 and another megabyte chunk at address 0x0000ffffffff0000. The total amount of memory allocated to the program cannot be more than 2^33 bytes, but each chunk can be located anywhere in the 2^64 space. (In reality this is a bit more complicated but similar enough to what I describe).
In your picture, you have 15 little squares that represent chunks of memory. Let's say it's physical memory. Virtual memory is 15,000 little squares, of which you can use any 15 at any given time.
So, considering this fact, the following scenarios are also possible.
A chunk of virtual address space is given to the program that is not backed by real physical memory. When and if the program attempts to access this space, the OS will try to allocate physical memory and map it to the corresponding address so that the program can continue. If this attempt fails, the program may be killed by the OS. The newly-free memory is now available to other programs that may want it.
The two short chunks of memory are mapped to new virtual addresses such that they form one long contiguous chunk in the virtual memory space. Remember that typically there are many more virtual memory addresses than there is physical memory, and it is normally easy to find an unassigned range. Typically this scenario is only realized when the memory chunks in question are large.
The problem that you are asking about is called heap-fragmentation, and it's a real, hard problem.
I will get bad_alloc exception for not having contiguous 10 length memory chunk.
This is the theory. But such a situation is really only possible within a 32-bit process; the 64-bit address space is vast.
That is, with a 64-bit process, it is more likely that heap fragmentation stops your new implementation from reusing some memory, which leads to an out-of-memory condition since it needs to ask the kernel for new memory for the entire D array instead of half of it. Also, such an OOM-condition will more likely cause your process to get shot by the OOM-killer sometime when you try to access a location in D, rather than new throwing an exception, because the kernel won't realize that it has overcommitted its memory before it's too late. For more information, google "memory overcommitment".
The program will automatically reallocate array B to the beginning of the heap and join together the rest of the unused memory.
No, it can't. You are in C++, and your runtime does not know where you have possibly stored pointers to B, so it would either run the danger of missing a pointer that needs to be modified, or run the danger of modifying something that's not a pointer to B but happens to have the same bit pattern.
The array D will be split into 2 parts and stored not contiguously causing not constant access time for n-th element of the array (if there are much more than 2 splits it starts to resemble a linked list rather than an array).
This is also not possible because C++ guarantees contiguous storage of arrays (to allow array accesses to be implemented via pointer arithmetic).

C++ allocates more bytes than asked?

int main(int argc, const char* argv[])
{
for(unsigned int i = 0; i < 10000000; i++)
char *c = new char;
cin.get();
}
In the above code, why does my program use 471MB memory instead of 10MB as one would expect?
Allocation of RAM comes from a combined effort of the runtime library and the operating system. In order to identify the one byte your example code requests, there is some structure which identifies this memory to the runtime. It is sometimes a double linked list, but it's defined by the operating system and runtime implementation.
You can analogize it this way: If you have a linked list container, what you're interested in is simply what you've placed inside each link, but the container must have pointers to the other links in the containers in order to maintain the linked list.
If you use a debugger, or some other debugging tool to track memory, these structures can be even larger, making each allocation more costly.
RAM isn't typically allocated out of an array, but it is possible to overload the new operator to change allocation behavior. It could be possible specifically allocate from an array (a large one in your example) so that allocations behaved as you seem to have expected, and in some applications this is a specific strategy to control memory and improve performance (though the details are usually more complex than that simple illustration).
The allocation not only contains the allocated memory itself, but at least one word telling delete how much memory it has to release; moreover that is a number that has to be correctly aligned, so there will be a certain padding after the allocated char to ensure that the next block is correctly aligned. On a 64 bit machine, that means at least 16 bytes per allocation (8 bytes to hold the size, 1 byte to hold the character, and 7 bytes padding to ensure correct alignment).
However most probably that's not the only data stored; to help the memory allocator to find free memory, additional data is likely stored; if one assumes that data to consist of three pointers, one gets to a total 40 bytes per allocation, which matches your data quite well.
Note also that the allocator will also request a bit more memory from the operating system than needed for the actual allocation, so that it won't need to do an expensive OS call for every little allocation. That is, the run time library allocates larger chunks of memory from the operating system, and then cuts those in smaller pieces for your program's allocations. Thus generally there will be some memory allocated from the operating system (and thus showing up in the task manager), but not yet allocated to a certain object in your program.

`std::string` allocations are my current bottleneck - how can I optimize with a custom allocator?

I'm writing a C++14 JSON library as an exercise and to use it in my personal projects.
By using callgrind I've discovered that the current bottleneck during a continuous value creation from string stress test is an std::string dynamic memory allocation. Precisely, the bottleneck is the call to malloc(...) made from std::string::reserve.
I've read that many existing JSON libraries such as rapidjson use custom allocators to avoid malloc(...) calls during string memory allocations.
I tried to analyze rapidjson's source code but the large amount of additional code and comments, plus the fact that I'm not really sure what I'm looking for, didn't help me much.
How do custom allocators help in this situation?
Is a memory buffer preallocated somewhere (where? statically?) and std::strings take available memory from it?
Are strings using custom allocators "compatible" with normal strings?
They have different types. Do they have to be "converted"? (And does that result in a performance hit?)
Code notes:
Str is an alias for std::string.
By default, std::string allocates memory as needed from the same heap as anything that you allocate with malloc or new. To get a performance gain from providing your own custom allocator, you will need to be managing your own "chunk" of memory in such a way that your allocator can deal out the amounts of memory that your strings ask for faster than malloc does. Your memory manager will make relatively few calls to malloc, (or new, depending on your approach) under the hood, requesting "large" amounts of memory at once, then deal out sections of this (these) memory block(s) through the custom allocator. To actually achieve better performance than malloc, your memory manager will usually have to be tuned based on known allocation patterns of your use cases.
This kind of thing often comes down to the age-old trade off of memory use versus execution speed. For example: if you have a known upper bound on your string sizes in practice, you can pull tricks with over-allocating to always accommodate the largest case. While this is wasteful of your memory resources, it can alleviate the performance overhead that more generalized allocation runs into with memory fragmentation. As well as making any calls to realloc essentially constant time for your purposes.
#sehe is exactly right. There are many ways.
EDIT:
To finally address your second question, strings using different allocators can play nicely together, and usage should be transparent.
For example:
class myalloc : public std::allocator<char>{};
myalloc customAllocator;
int main(void)
{
std::string mystring(customAllocator);
std::string regularString = "test string";
mystring = regularString;
std::cout << mystring;
return 0;
}
This is a fairly silly example and, of course, uses the same workhorse code under the hood. However, it shows assignment between strings using allocator classes of "different types". Implementing a useful allocator that supplies the full interface required by the STL without just disguising the default std::allocator is not as trivial. This seems to be a decent write up covering the concepts involved. The key to why this works, in the context of your question at least, is that using different allocators doesn't cause the strings to be of different type. Notice that the custom allocator is given as an argument to the constructor not a template parameter. The STL still does fun things with templates (such as rebind and Traits) to homogenize allocator interfaces and tracking.
What often helps is the creation of a GlobalStringTable.
See if you can find portions of the old NiMain library from the now defunct NetImmerse software stack. It contains an example implementation.
Lifetime
What is important to note is that this string table needs to be accessible between different DLL spaces, and that it is not a static object. R. Martinho Fernandes already warned that the object needs to be created when the application or DLL thread is created / attached, and disposed when the thread is destroyed or the dll is detached, and preferrably before any string object is actually used. This sounds easier than it actually is.
Memory allocation
Once you have a single point of access that exports correctly, you can have it allocate a memory buffer up-front. If the memory is not enough, you have to resize it and move the existing strings over. Strings essentially become handles to regions of memory in this buffer.
Placement new
Something that often works well is called the placement new() operator, where you can actually specify where in memory your new string object needs to be allocated. However, instead of allocating, the operator can simply grab the memory location that is passed in as an argument, zero the memory at that location, and return it. You can also keep track of the allocation, the actual size of the string etc.. in the Globalstringtable object.
SOA
Handling the actual memory scheduling is something that is up to you, but there are many possible ways to approach this. Often, the allocated space is partitioned in several regions so that you have several blocks per possible string size. A block for strings <= 4 bytes, one for <= 8 bytes, and so on. This is called a Small Object Allocator, and can be implemented for any type and buffer.
If you expect many string operations where small strings are incremented repeatedly, you may change your strategy and allocate larger buffers from the start, so that the number of memmove operations are reduced. Or you can opt for a different approach and use string streams for those.
String operations
It is not a bad idea to derive from std::basic_str, so that most of the operations still work but the internal storage is actually in the GlobalStringTable, so that you can keep using the same stl conventions. This way, you also make sure that all the allocations are within a single DLL, so that there can be no heap corruption by linking different kinds of strings between different libraries, since all the allocation operations are essentially in your DLL (and are rerouted to the GlobalStringTable object)
Custom allocators can help because most malloc()/new implementations are designed for maximum flexibility, thread-safety and bullet-proof workings. For instance, they must gracefully handle the case that one thread keeps allocating memory, sending the pointers to another thread that deallocates them. Things like these are difficult to handle in a performant way and drive the cost of malloc() calls.
However, if you know that some things cannot happen in your application (like one thread deallocating stuff another thread allocated, etc.), you can optimize your allocator further than the standard implementation. This can yield significant results, especially when you don't need thread safety.
Also, the standard implementation is not necessarily well optimized: Implementing void* operator new(size_t size) and void operator delete(void* pointer) by simply calling through to malloc() and free() gives an average performance gain of 100 CPU cycles on my machine, which proves that the default implementation is suboptimal.
I think you'd be best served by reading up on the EASTL
It has a section on allocators and you might find fixed_string useful.
The best way to avoid a memory allocation is don't do it!
BUT if I remember JSON correctly all the readStr values either gets used as keys or as identifiers so you will have to allocate them eventually, std::strings move semantics should insure that the allocated array are not copied around but reused until its final use. The default NRVO/RVO/Move should reduce any copying of the data if not of the string header itself.
Method 1:
Pass result as a ref from the caller which has reserved SomeResonableLargeValue chars, then clear it at the start of readStr. This is only usable if the caller actually can reuse the string.
Method 2:
Use the stack.
// Reserve memory for the string (BOTTLENECK)
if (end - idx < SomeReasonableValue) { // 32?
char result[SomeReasonableValue] = {0}; // feel free to use std::array if you want bounds checking, but the preceding "if" should insure its not a problem.
int ridx = 0;
for(; idx < end; ++idx) {
// Not an escape sequence
if(!isC('\\')) { result[ridx++] = getC(); continue; }
// Escape sequence: skip '\'
++idx;
// Convert escape sequence
result[ridx++] = getEscapeSequence(getC());
}
// Skip closing '"'
++idx;
result[ridx] = 0; // 0-terminated.
// optional assert here to insure nothing went wrong.
return result; // the bottleneck might now move here as the data is copied to the receiving string.
}
// fallback code only if the string is long.
// Your original code here
Method 3:
If your string by default can allocate some size to fill its 32/64 byte boundary, you might want to try to use that, construct result like this instead in case the constructor can optimize it.
Str result(end - idx, 0);
Method 4:
Most systems already has some optimized allocator that like specific block sizes, 16,32,64 etc.
siz = ((end - idx)&~0xf)+16; // if the allocator has chunks of 16 bytes already.
Str result(siz);
Method 5:
Use either the allocator made by google or facebooks as global new/delete replacement.
To understand how a custom allocator can help you, you need to understand what malloc and the heap does and why it is quite slow in comparison to the stack.
The Stack
The stack is a large block of memory allocated for your current scope. You can think of it as this
([] means a byte of memory)
[P][][][][][][][][][][][][][][][]
(P is a pointer that points to a specific byte of memory, in this case its pointing at the first byte)
So the stack is a block with only 1 pointer. When you allocate memory, what it does is it performs a pointer arithmetic on P, which takes constant time.
So declaring int i = 0; would mean this,
P + sizeof(int).
[i][i][i][i][P][][][][][][][][][][][],
(i in [] is a block of memory occupied by an integer)
This is blazing fast and as soon as you go out of scope, the entire chunk of memory is emptied simply by moving P back to the first position.
The Heap
The heap allocates memory from a reserved pool of bytes reserved by the c++ compiler at runtime, when you call malloc, the heap finds a length of contiguous memory that fits your malloc requirements, marks it as used so nothing else can use it, and returns that to you as a void*.
So, a theoretical heap with little optimization calling new(sizeof(int)), would do this.
Heap chunk
At first : [][][][][][][][][][][][][][][][][][][][][][][][][]
Allocate 4 bytes (sizeof(int)):
A pointer goes though every byte of memory, finds one that is of correct length, and returns to you a pointer.
After : [i][i][i][i][][][]][][][][][][][][][]][][][][][][][]
This is not an accurate representation of the heap, but from this you can already see numerous reasons for being slow relative to the stack.
The heap is required to keep track of all already allocated memory and their respective lengths. In our test case above, the heap was already empty and did not require much, but in worst case scenarios, the heap will be populated with multiple objects with gaps in between (heap fragmentation), and this will be much slower.
The heap is required to cycle though all the bytes to find one that fits your length.
The heap can suffer from fragmentation since it will never completely clean itself unless you specify it. So if you allocated an int, a char, and another int, your heap would look like this
[i][i][i][i][c][i2][i2][i2][i2]
(i stands for bytes occupied by int and c stands for bytes occupied by a char. When you de-allocate the char, it will look like this.
[i][i][i][i][empty][i2][i2][i2][i2]
So when you want to allocate another object into the heap,
[i][i][i][i][empty][i2][i2][i2][i2][i3][i3][i3][i3]
unless an object is the size of 1 char, the overall heap size for that allocation is reduced by 1 byte. In more complex programs with millions of allocations and deallocations, the fragmentation issue becomes severe and the program will become unstable.
Worry about cases like thread safety (Someone else said this already).
Custom Heap/Allocator
So, a custom allocator usually needs to address these problems while providing the benefits of the heap, such as personalized memory management and object permanence.
These are usually accomplished with specialized allocators. If you know you dont need to worry about thread safety or you know exactly how long your string will be or a predictable usage pattern you can make your allocator fast than malloc and new by quite a lot.
For example, if your program requires a lot of allocations as fast as possible without lots of deallocations, you could implement a stack allocator, in which you allocate a huge chunk of memory with malloc at startup,
e.g
typedef char* buffer;
//Super simple example that probably doesnt work.
struct StackAllocator:public Allocator{
buffer stack;
char* pointer;
StackAllocator(int expectedSize){ stack = new char[expectedSize];pointer = stack;}
allocate(int size){ char* returnedPointer = pointer; pointer += size; return returnedPointer}
empty() {pointer = stack;}
};
Get expected size, get a chunk of memory from the heap.
Assign a pointer to the beginning.
[P][][][][][][][][][] ..... [].
then have one pointer that moves for each allocation. When you no longer need the memory, you simply move the pointer to the beginning of your buffer. This gives your the advantage of O(1) speed allocations and deallocations as well as object permanence for the lack of flexible deallocation and large initial memory requirements.
For strings, you could try a chunk allocator. For every allocation, the allocator gives a set chunk of memory.
Compatibility
Compatibility with other strings is almost guaranteed. As long as you are allocating a contiguous chunk of memory and preventing anything else from using that block of memory, it will work.

Memory Demands: Heap vs Stack in C++

So I had a strange experience this evening.
I was working on a program in C++ that required some way of reading a long list of simple data objects from file and storing them in the main memory, approximately 400,000 entries. The object itself is something like:
class Entry
{
public:
Entry(int x, int y, int type);
Entry(); ~Entry();
// some other basic functions
private:
int m_X, m_Y;
int m_Type;
};
Simple, right? Well, since I needed to read them from file, I had some loop like
Entry** globalEntries;
globalEntries = new Entry*[totalEntries];
entries = new Entry[totalEntries];// totalEntries read from file, about 400,000
for (int i=0;i<totalEntries;i++)
{
globalEntries[i] = new Entry(.......);
}
That addition to the program added about 25 to 35 megabytes to the program when I tracked it on the task manager. A simple change to stack allocation:
Entry* globalEntries;
globalEntries = new Entry[totalEntries];
for (int i=0;i<totalEntries;i++)
{
globalEntries[i] = Entry(.......);
}
and suddenly it only required 3 megabytes. Why is that happening? I know pointer objects have a little bit of extra overhead to them (4 bytes for the pointer address), but it shouldn't be enough to make THAT much of a difference. Could it be because the program is allocating memory inefficiently, and ending up with chunks of unallocated memory in between allocated memory?
Your code is wrong, or I don't see how this worked. With new Entry [count] you create a new array of Entry (type is Entry*), yet you assign it to Entry**, so I presume you used new Entry*[count].
What you did next was to create another new Entry object on the heap, and storing it in the globalEntries array. So you need memory for 400.000 pointers + 400.000 elements. 400.000 pointers take 3 MiB of memory on a 64-bit machine. Additionally, you have 400.000 single Entry allocations, which will all require sizeof (Entry) plus potentially some more memory (for the memory manager -- it might have to store the size of allocation, the associated pool, alignment/padding, etc.) These additional book-keeping memory can quickly add up.
If you change your second example to:
Entry* globalEntries;
globalEntries = new Entry[count];
for (...) {
globalEntries [i] = Entry (...);
}
memory usage should be equal to the stack approach.
Of course, ideally you'll use a std::vector<Entry>.
First of all, without specifying which column exactly you were watching, the number in task manager means nothing. On a modern operating system it's difficult even to define what you mean with "used memory" - are we talking about private pages? The working set? Only the stuff that stays in RAM? does reserved but not committed memory count? Who pays for memory shared between processes? Are memory mapped file included?
If you are watching some meaningful metric, it's impossible to see 3 MB of memory used - your object is at least 12 bytes (assuming 32 bit integers and no padding), so 400000 elements will need about 4.58 MB. Also, I'd be surprised if it worked with stack allocation - the default stack size in VC++ is 1 MB, you should already have had a stack overflow.
Anyhow, it is reasonable to expect a different memory usage:
the stack is (mostly) allocated right from the beginning, so that's memory you nominally consume even without really using it for anything (actually virtual memory and automatic stack expansion makes this a bit more complicated, but it's "true enough");
the CRT heap is opaque to the task manager: all it sees is the memory given by the operating system to the process, not what the C heap has "really" in use; the heap grows (requesting memory to the OS) more than strictly necessary to be ready for further memory requests - so what you see is how much memory it is ready to give away without further syscalls;
your "separate allocations" method has a significant overhead. The all-contiguous array you'd get with new Entry[size] costs size*sizeof(Entry) bytes, plus the heap bookkeeping data (typically a few integer-sized fields); the separated allocations method costs at least size*sizeof(Entry) (size of all the "bare elements") plus size*sizeof(Entry *) (size of the pointer array) plus size+1 multiplied by the cost of each allocation. If we assume a 32 bit architecture with a cost of 2 ints per allocation, you quickly see that this costs size*24+8 bytes of memory, instead of size*12+8 for the contiguous array in the heap;
the heap normally really gives away blocks that aren't really the size you asked for, because it manages blocks of fixed size; so, if you allocate single objects like that you are probably paying also for some extra padding - supposing it has 16 bytes blocks, you are paying 4 bytes extra per element by allocating them separately; this moves out memory estimation to size*28+8, i.e. an overhead of 16 bytes per each 12-byte element.

Why is the heap after array allocation so large

I've got a very basic application that boils down to the following code:
char* gBigArray[200][200][200];
unsigned int Initialise(){
for(int ta=0;ta<200;ta++)
for(int tb=0;tb<200;tb++)
for(int tc=0;tc<200;tc++)
gBigArray[ta][tb][tc]=new char;
return sizeof(gBigArray);
}
The function returns the expected value of 32000000 bytes, which is approximately 30MB, yet in the Windows Task Manager (and granted it's not 100% accurate) gives a Memory (Private Working Set) value of around 157MB. I've loaded the application into VMMap by SysInternals and have the following values:
I'm unsure what Image means (listed under Type), although irrelevant of that its value is around what I'm expecting. What is really throwing things out for me is the Heap value, which is where the apparent enormous size is coming from.
What I don't understand is why this is? According to this answer if I've understood it correctly, gBigArray would be placed in the data or bss segment - however I'm guessing as each element is an uninitialised pointer it would be placed in the bss segment. Why then would the heap value be larger by a silly amount than what is required?
It doesn't sound silly if you know how memory allocators work. They keep track of the allocated blocks so there's a field storing the size and also a pointer to the next block, perhaps even some padding. Some compilers place guarding space around the allocated area in debug builds so if you write beyond or before the allocated area the program can detect it at runtime when you try to free the allocated space.
you are allocating one char at a time. There is typically a space overhead per allocation
Allocate the memory on one big chunk (or at least in a few chunks)
Do not forget that char* gBigArray[200][200][200]; allocates space for 200*200*200=8000000 pointers, each word size. That is 32 MB on a 32 bit system.
Add another 8000000 char's to that for another 8MB. Since you are allocating them one by one it probably can't allocate them at one byte per item so they'll probably also take the word size per item resulting in another 32MB (32 bit system).
The rest is probably overhead, which is also significant because the C++ system must remember how many elements an array allocated with new contains for delete [].
Owww! My embedded systems stuff would roll over and die if faced with that code. Each allocation has quite a bit of extra info associated with it and either is spaced to a fixed size, or is managed via a linked list type object. On my system, that 1 char new would become a 64 byte allocation out of a small object allocator such that management would be in O(1) time. But in other systems, this could easily fragment your memory horribly, make subsequent new and deletes run extremely slowly O(n) where n is number of things it tracks, and in general bring doom upon an app over time as each char would become at least a 32 byte allocation and be placed in all sorts of cubby holes in memory, thus pushing your allocation heap out much further than you might expect.
Do a single large allocation and map your 3D array over it if you need to with a placement new or other pointer trickery.
Allocating 1 char at a time is probably more expensive. There are metadata headers per allocation so 1 byte for a character is smaller than the header metadata so you might actually save space by doing one large allocation (if possible) that way you mitigate the overhead of each individual allocation having its own metadata.
Perhaps this is an issue of memory stride? What size of gaps are between values?
30 MB is for the pointers. The rest is for the storage you allocated with the new call that the pointers are pointing to. Compilers are allowed to allocate more than one byte for various reasons, like to align on word boundaries, or give some growing room in case you want it later. If you want 8 MB worth of characters, leave the * off your declaration for gBigArray.
Edited out of the above post into a community wiki post:
As the answers below say, the issue here is I am creating a new char 200^3 times, and although each char is only 1 byte, there is overhead for every object on the heap. It seems creating a char array for all chars knocks the memory down to a more believable level:
char* gBigArray[200][200][200];
char* gCharBlock=new char[200*200*200];
unsigned int Initialise(){
unsigned int mIndex=0;
for(int ta=0;ta<200;ta++)
for(int tb=0;tb<200;tb++)
for(int tc=0;tc<200;tc++)
gBigArray[ta][tb][tc]=&gCharBlock[mIndex++];
return sizeof(gBigArray);
}