when running this code:
#include <iostream>
#include <vector>
#include <deque>
template< typename C >
void fillToMax( C & collection, typename C::value_type value )
{
try
{
while( true )
collection.push_back( value );
}
catch( std::bad_alloc const& )
{
std::cout << "bad alloc with size " << collection.size() << std::endl;
}
return;
}
void fillVector()
{
std::vector<long> vecL;
fillToMax( vecL, 123 );
}
void fillDeque()
{
std::deque<long> deqL;
fillToMax( deqL, 123 );
}
int main()
{
fillVector();
fillDeque();
}
I get an expected bad_alloc error, therefore that is easy to try/catch.
The problem is when I substitute vector with deque, in this case my machine just crashes... blackscreen, reboots and when up again claims: you had an unexpected problem!
I would like to use deque instead of vector to store a larger amount of items without the issue of contiguous space. This will enable me to store more data but I cannot afford for my application to crash and would like to know how I can get this to bad_alloc instead.
Is this possible?
My tests use MinGW-W64 - gcc version 4.8.2 (x86_64-posix-seh-rev4) on win8.1
You don't say what system you're using, so it's hard to say, but
some systems "overcommit", which basically makes a conforming
implementation of C++ (or even C) impossible; the system will
say that there is memory available when there isn't, and crash
when you try to use it. Linux is the most widely documented
culprit here, but you can reconfigure it to work correctly.
The reason you get bad_alloc with vector is because vector
allocates much larger chunks. And even with overcommit, the
system will refuse to allocate memory if the chunk is too big.
Also, many mallocs will use a different allocation strategy for
very large chunks; IIRC, the malloc in Linux switches to using
mmap beyond a certain size, and the system may refuse a mmap
even when an sbrk would have succeeded.
The fast answer of why vector might crash and not deque is that because vector uses a contiguous buffer you'll bad_alloc "quicker". And also on a request that asks for a large chunk.
Why? Because it is less likely that you will be able to allocate a contiguous buffer than a smaller one.
vector will allocate a certain amount and then try a big "realloc" for a bigger buffer. It might be possible to extend the current memory space but it might not, and may need to find a whole new chunk of memory.
Let's say it looks to expand by a factor of 1.5. So you currently have 40% of the memory available in your vector in use and it needs to find 60% of the memory available but cannot do it at the current location. Well that takes you to the limit so it fails with bad_alloc but in reality you are only using 40% of the memory.
So in reality there is memory available and those operating systems that use "optimistic" memory allocation will not accidentally over-allocate for you. You've asked for a lot and it couldn't give it to you. (They are not always totally optimistic).
deque on the other hand asks for a chunk at a time. You will really use up your memory and as a result it's better to use for large collections, however it has the downside that when you run out of memory you really do run out. And your lovely optimistic memory allocator cannot handle it and your process dies. (It kills something to make more memory. Sadly it was yours).
Now for your solution of how to avoid it happening? Your answer might be a custom allocator, i.e. the 2nd parameter of deque, which could check the real system memory available and refuse to allocate if you have hit a certain threshold.
Of course it is system dependent but you could have different versions for different machines.
You could also set your own arbitrary "limit", of course.
Assuming your system is Linux, you might be able to turn overcommit off with
'echo 2 > /proc/sys/vm/overcommit_memory'
You would need root (admin) permissions to do that. (Or get someone who has it to configure it that way).
Otherwise, other ways to examine the memory usage are available in the Linux manuals, usually referred to in /proc.
If your system isn't Linux but another that over-commits, you'll have to look up how you can by-pass it by writing your own memory manager. Otherwise take the simpler option of an arbitrary configurable maximum size.
Remember that with deque your allocator will only be invoked when you need to allocate a new "chunk" and not for every push_back.
Xarylem, just attempting to answer "how can I prevent this" here...
you know something that throws bad_alloc - std::vector.
you know something that crashes... std::deque.
So one way would be to create a new vector of size X, if that succeeds, clear the vector and push back X more into the deque. If it doesn't, you know you're walking into a quagmire. Something like:
std::vector<int> testVector;
testeVector.reserve(1);
std::deque<int> actualDequeToFill;
for(size_t i = 0; ; ++i)
{
//test first
bool haveSpace = false;
try { testVector.reserve(2); } catch(...) { haveSpace = false; }
vector.reserve(1);
if (!haveSpace) throw new std::bad_alloc("Vector shows no space left");
deque.push_back(something);
}
This isn't anywhere close to foolproof... so please use it as a possible idea for a workaround rather than as an implementation.
Now that that is aside, my best guess would be... your compiler is not compliant ... as I've mentioned in a comment, C++ requires deque::push_back to throw bad_alloc. If you can, move away from that compiler (this is basic stuff to get right)
Related
Following up on this question Vector push_back only if enough memory is available, I tried
to rephrase the question in a more general sense.
Consider this fragment :
vector<double> v1;
cout << "pushing back ..." << endl;
while (true) {
try {
v1.push_back(0.0);
} catch (bad_alloc& ba){
cout << "bad_alloc caught: " << ba.what() << endl;
break;
}
}
Which of the following statements regarding the above code fragment are true ?
1) Eventually, the catch block will be reached
2) You can not determine beforehand if there is enough memory for push_back to not throw bad_alloc
3) Every action in the catch block that involves memory allocation could fail, because there is no memory left
The first thing I did was to run this program on Windows which lead to the observation that before any paging happened, bad_alloc was thrown because obviously the per process amount of memory had been exceeded. This observation lead to the next statement :
4) On most Operating Systems bad_alloc will be thrown before paging happens, but there is no certain way to tell beforehand.
After some research I came up with the following thoughts on the above statements :
A1) True, the catch block will be reached but maybe not before the OS has performed intensive I/O operations due to paging.
A2) True, at least not in an OS independent way
A3) True, you have to preallocate memory in order to do something useful with data in the vector gathered so far (e.g. do some paging on your own, if you find this useful)
A4) True, this is dependent on multiple OS-specific parameters like max amount of RAM per process, process priority, strategy of the OS process scheduler etc ...
I am not sure if A1-A4 are correct, hence my question, but if so, here is the next statement :
5) If you need to write some algorithm and be sure that there will be no paging, do not use dynamic data structures like std::vector. Instead use an Array and make sure it will stay in memory using OS-specific functions like for example mlockall (Unix)
If 5) is true it leads to the last statement :
6) There is no OS-independent way to write a program that will not cause paging.
Thanks everybody in advance for sharing your thoughts on the above statements.
If your program must run on Windows/Unix/OS X make a wrapper functions:
bool lockMemoryRegion( void *addr, size_t size )
{
#ifdef WIN32
return VirtualLock( addr, size ) != 0;
#else
return mlock( addr, size ) == 0;
#endif
}
bool unlockMemoryRegion( void *addr, size_t size )
{
#ifdef WIN32
return VirtualUnlock( addr, size ) != 0;
#else
return munlock( addr, size ) == 0;
#endif
}
Then if you need to lock memory used by std::vector:
std::vector<int> v( 1000 );
lockMemoryRegion( v.data(), v.capacity() * sizeof (int) );
Use memory locks only if you really ought to. Locking pages into memory may degrade the performance of the system by reducing the available RAM and forcing the system to swap out other critical pages to the paging file.
What a rambling mess of a question. You still need to get your head around modern memory allocation on the operating systems you're actually interested in. I'd recommend a bit of systematic background reading, as answers to your hodge-podge of questions won't necessarily give you the proper big picture.
1) Eventually, the catch block will be reached
2) You can not determine beforehand if there is enough memory for push_back to not throw bad_alloc
3) Every action in the catch block that involves memory allocation could fail, because there is no memory left
None of these are necessarily true... the OS may allocate the virtual address space then terminate the process when it's accessed and the OS can't find physical memory to back it. Further, a low-memory process killer may decide you've pushed too far and terminate you or any other non-critical process.
For 3) specifically, the Standard explicitly says an implementation may use a separate memory area to convey the thrown object towards the catch statement that will handle it - after all, it doesn't makes sense to put in on the same stack you're unwinding during exception processing. So, that memory allocation has much less issues than dynamic memory allocation (with new or malloc) but may still page and therefore precipitate process termination in very rare cases. It's still dangerous is the object being thrown internally does dynamic memory allocation (e.g. stores a description in a string or istringstream data member). Similarly, the catch statement may allocate stack space for variables, expression evaluations, function calls etc. - they could also precipitate failure but are less dangerous than new/malloc.
4) On most Operating Systems bad_alloc will be thrown before paging happens, but there is no certain way to tell beforehand.
Certainly not - what would be the point of paging then?
A1) True, the catch block will be reached but maybe not before the OS has performed intensive I/O operations due to paging.
If there happens to be swap disk in use, then yes you should get paging happening before an out of memory condition, but again that may not manifest as an exception.
A2) True, at least not in an OS independent way
Nope... it wasn't true to begin with.
A3) True, you have to preallocate memory in order to do something useful with data in the vector gathered so far (e.g. do some paging on your own, if you find this useful)
You don't have to preallocate anything... which would be done with a constructor parameter or resize... that's optional, but may allow you to process more data without hitting an out of memory condition simply because there's less need for momentarily increased memory usage as the data is moved to a larger memory block. All that has nothing to do with whether you "do something useful", and I have no idea what you imagine by "do some paging on your own". If you access vector elements they may have to be paged in. If you haven't used them for a while they may be paged out. The OS caching algorithms decide this. You may want to at least understand a simple algorithm of this type, such as Least Recently Used (LRU).
A4) True, this is dependent on multiple OS-specific parameters like max amount of RAM per process, process priority, strategy of the OS process scheduler etc ...
You can have a per-process memory allocation limit, but your conception that paging won't happen until you exceed that limit is wrong. Paging can happen to any part of your process - dynamically allocated, stack, executable image, static data, thread-specific data etc. - whenever the OS sees it hasn't been used for a while and wants the physical memory for some other more pressing purpose.
Your question makes it clear the following suppositions are conditional on the truth of the earlier ones, but I'll address them quickly as they have elements of truth and/or relevance anyway....
5) If you need to write some algorithm and be sure that there will be no paging, do not use dynamic data structures like std::vector. Instead use an Array and make sure it will stay in memory using OS-specific functions like for example mlockall (Unix)
Which type of data type/container you use is irrelevant - the OS doesn't even know or care to what use you're putting different parts of the memory it's granted your process. So, functions like that can be applied to arrays or dynamically allocated memory - for example - if you've populated a vector then you can use .data() to get a pointer to the actual memory region storing data, then lock it into physical RAM. Of course, if you do something to force the vector to find a different memory region (e.g. adding elements beyond capacity()) then it will still look for more memory and having some deleted memory region locked in physical memory may adversely affect your process and system performance.
If 5) is true it leads to the last statement :
6) There is no OS-independent way to write a program that will not cause paging.
No, there's not. Paging is meant to be transparent to the processes undergoing it, and processes rarely need to avoid it.
1, 2, and 3 are all correct, assuming that 2 refers to portable ways. You can make a decent guess based on OS-specific process memory usage reporting functions. They're not that accurate and they're not portable, but they do offer a fairly good guess.
As for 4, that's just not true. It is a function of the amount of physical memory compared to the virtual address space size of the process. x64 has a way larger address space than there is physical memory. x86 is substantially smaller now but go back a few years to older machines with 2GB or 1GB of RAM and it would be bigger.
If you need to write some algorithm and be sure that there will be no
paging, do not use dynamic data structures like std::vector. Instead
use an Array and make sure it will stay in memory using OS-specific
functions like for example mlockall (Unix)
Bullshit. You can reserve the vector to allocate all the memory you need, then call mlock anyway.
But there is most certainly no OS-independent way to write a program that will not cause paging. Paging is an implementation detail of the flat memory model used by C++ and there is certainly no Standard functionality relating to this implementation detail, nor will there ever be.
1) Eventually, the catch block will be reached
This "eventually" doesn't mean "when you allocate up to bytes" but a lot more (virtual memory mapping - if present - would have to be exhausted as well).
I've seen a linux process scheduler about ten years ago had a habbit of killing applications that misbehaved. I think this application would qualify (i.e. it may be terminated by the OS before the catch block is reached).
3) Every action in the catch block that involves memory allocation could fail, because there is no memory left
Theoretically true, practically, probably false. The vector will keep allocating larger and larger contiguous blocks. As it does, it is possible it will no longer be able to allocate a LARGE block, but the previous smaller allocations have been released. It it possible that you will have some free memory available in the catch block.
4) On most Operating Systems bad_alloc will be thrown before paging happens, but there is no certain way to tell beforehand.
Since there is no way to tell beforehand, the only realistic way to find out is to measure it.
5) If you need to write some algorithm and be sure that there will be no paging, do not use dynamic data structures like std::vector. Instead use an Array and make sure it will stay in memory using OS-specific functions like for example mlockall (Unix)
This is incorrect. A vector is a safe wrapper on an allocated contiguous memory block. You can just as well work with a vector and memory locking functions.
For (6): Paging is HW, OS and application dependent (you can run the same application on two different systems and have it paged differently).
This code snippet will allocate 2Gb every time it reads the letter 'u' from stdin, and will initialize all the allocated chars once it reads 'a'.
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#define bytes 2147483648
using namespace std;
int main()
{
char input [1];
vector<char *> activate;
while(input[0] != 'q')
{
gets (input);
if(input[0] == 'u')
{
char *m = (char*)malloc(bytes);
if(m == NULL) cout << "cant allocate mem" << endl;
else cout << "ok" << endl;
activate.push_back(m);
}
else if(input[0] == 'a')
{
for(int x = 0; x < activate.size(); x++)
{
char *m;
m = activate[x];
for(unsigned x = 0; x < bytes; x++)
{
m[x] = 'a';
}
}
}
}
return 0;
}
I am running this code on a linux virtual machine that has 3Gb of ram. While monitoring the system resource usage using the htop tool, I have realized that the malloc operation is not reflected on the resources.
For example when I input 'u' only once(i.e. allocate 2GB of heap memory), I don't see the memory usage increasing by 2GB in htop. It is only when I input 'a'(i.e. initialize), I see the memory usage increasing.
As a consequence, I am able to "malloc" more heap memory than there exists. For example, I can malloc 6GB(which is more than my ram and swap memory) and malloc would allow it(i.e. NULL is not returned by malloc). But when I try to initialize the allocated memory, I can see the memory and swap memory filling up till the process is killed.
-My questions:
1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?
It is called memory overcommit. You can disable it by running as root:
echo 2 > /proc/sys/vm/overcommit_memory
and it is not a kernel feature that I like (so I always disable it). See malloc(3) and mmap(2) and proc(5)
NB: echo 0 instead of echo 2 often -but not always- works also. Read the docs (in particular proc man page that I just linked to).
from man malloc (online here):
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
So when you just want to allocate too much, it "lies" to you, when you want to use the allocated memory, it will try to find enough memory for you and it might crash if it can't find enough memory.
No, this is not a kernel bug. You have discovered something known as late paging (or overcommit).
Until you write a byte to the address allocated with malloc (...) the kernel does little more than "reserve" the address range. This really depends on the implementation of your memory allocator and operating system of course, but most good ones do not incur the majority of kernel overhead until the memory is first used.
The hoard allocator is one big offender that comes to mind immediately, through extensive testing I have found it almost never takes advantage of a kernel that supports late paging. You can always mitigate the effects of late paging in any allocator if you zero-fill the entire memory range immediately after allocation.
Real-time operating systems like VxWorks will never allow this behavior because late paging introduces serious latency. Technically, all it does is put the latency off until a later indeterminate time.
For a more detailed discussion, you may be interested to see how IBM's AIX operating system handles page allocation and overcommitment.
This is a result of what Basile mentioned, over commit memory. However, the explanation kind of interesting.
Basically when you attempt to map additional memory in Linux (POSIX?), the kernel will just reserve it, and will only actually end up using it if your application accesses one of the reserved pages. This allows multiple applications to reserve more than the actual total amount of ram / swap.
This is desirable behavior on most Linux environments unless you've got a real-time OS or something where you know exactly who will need what resources, when and why.
Otherwise somebody could come along, malloc up all the ram (without actually doing anything with it) and OOM your apps.
Another example of this lazy allocation is mmap(), where you have a virtual map that the file you're mapping can fit inside - but you only have a small amount of real memory dedicated to the effort. This allows you to mmap() huge files (larger than your available RAM), and use them like normal file handles which is nifty)
-n
Initializing / working with the memory should work:
memset(m, 0, bytes);
Also you could use calloc that not only allocates memory but also fills it with zeros for you:
char* m = (char*) calloc(1, bytes);
1.Is this a kernel bug?
No.
2.Can someone explain to me why this behavior is allowed?
There are a few reasons:
Mitigate need to know eventual memory requirement - it's often convenient to have an application be able to an amount of memory that it considers an upper limit on the need it might actually have. For example, if it's preparing some kind of report either of an initial pass just to calculate the eventual size of the report or a realloc() of successively larger areas (with the risk of having to copy) may significantly complicate the code and hurt performance, where-as multiplying some maximum length of each entry by the number of entries could be very quick and easy. If you know virtual memory is relatively plentiful as far as your application's needs are concerned, then making a larger allocation of virtual address space is very cheap.
Sparse data - if you have the virtual address space spare, being able to have a sparse array and use direct indexing, or allocate a hash table with generous capacity() to size() ratio, can lead to a very high performance system. Both work best (in the sense of having low overheads/waste and efficient use of memory caches) when the data element size is a multiple of the memory paging size, or failing that much larger or a small integral fraction thereof.
Resource sharing - consider an ISP offering a "1 giga-bit per second" connection to 1000 consumers in a building - they know that if all the consumers use it simultaneously they'll get about 1 mega-bit, but rely on their real-world experience that, though people ask for 1 giga-bit and want a good fraction of it at specific times, there's inevitably some lower maximum and much lower average for concurrent usage. The same insight applied to memory allows operating systems to support more applications than they otherwise would, with reasonable average success at satisfying expectations. Much as the shared Internet connection degrades in speed as more users make simultaneous demands, paging from swap memory on disk may kick in and reduce performance. But unlike an internet connection, there's a limit to the swap memory, and if all the apps really do try to use the memory concurrently such that that limit's exceeded, some will start getting signals/interrupts/traps reporting memory exhaustion. Summarily, with this memory overcommit behaviour enabled, simply checking malloc()/new returned a non-NULL pointer is not sufficient to guarantee the physical memory is actually available, and the program may still receive a signal later as it attempts to use the memory.
I have noticed some interesting behavior in Linux with regard to the Memory Usage (RES) reported by top. I have attached the following program which allocates a couple million objects on the heap, each of which has a buffer that is around 1 kilobyte. The pointers to those objects are tracked by either a std::list, or a std::vector. The interesting behavior I have noticed is that if I use a std::list, the Memory Usage reported by top never changes during the sleep periods. However if I use std::vector, the memory usage will drop to near 0 during those sleeps.
My test configuration is:
Fedora Core 16
Kernel 3.6.7-4
g++ version 4.6.3
What I already know:
1. std::vector will re-allocate (doubling its size) as needed.
2. std::list (I beleive) is allocating its elements 1 at a time
3. both std::vector and std::list are using std::allocator by default to get their actual memory
4. The program is not leaking; valgrind has declared that no leaks are possible.
What I'm confused by:
1. Both std::vector and std::list are using std::allocator. Even if std::vector is doing batch re-allocations, wouldn't std::allocator be handing out memory in almost the same arrangement to std::list and std::vector? This program is single threaded after all.
2. Where can I learn about the behavior of Linux's memory allocation. I have heard statements about Linux keeping RAM assigned to a process even after it frees it, but I don't know if that behavior is guaranteed. Why does using std::vector impact that behavior so much?
Many thanks for reading this; I know this is a pretty fuzzy problem. The 'answer' I'm looking for here is if this behavior is 'defined' and where I can find its documentation.
#include <string.h>
#include <unistd.h>
#include <iostream>
#include <vector>
#include <list>
#include <iostream>
#include <memory>
class Foo{
public:
Foo()
{
data = new char[999];
memset(data, 'x', 999);
}
~Foo()
{
delete[] data;
}
private:
char* data;
};
int main(int argc, char** argv)
{
for(int x=0; x<10; ++x)
{
sleep(1);
//std::auto_ptr<std::list<Foo*> > foos(new std::list<Foo*>);
std::auto_ptr<std::vector<Foo*> > foos(new std::vector<Foo*>);
for(int i=0; i<2000000; ++i)
{
foos->push_back(new Foo());
}
std::cout << "Sleeping before de-alloc\n";
sleep(5);
while(false == foos->empty())
{
delete foos->back();
foos->pop_back();
}
}
std::cout << "Sleeping after final de-alloc\n";
sleep(5);
}
The freeing of memory is done on a "chunk" basis. It's quite possible that when you use list, the memory gets fragmented into little tiny bits.
When you allocate using a vector, all elements are stored in one big chunk, so it's easy for the memory freeing code to say "Golly, i've got a very large free region here, I'm going to release it back to the OS". It's also entirely possible that when growing the vector, the memory allocator goes into "large chunk mode", which uses a different allocation method than "small chunk mode" - say for example you allocate more than 1MB, the memory allocation code may see that as a good time to start using a different strategy, and just ask the OS for a "perfect fit" piece of memory. This large block is very easy to release back to he OS when it's being freed.
On the ohter hand if you are adding to a list, you are constantly asking for little bits, so the allocator uses a different strategy of asking for large block and then giving out small portions. It's both difficult and time-consuming to ensure that ALL blocks within a chunk have been freed, so the allocator may well "not bother" - because chances are that there are some regions in there "still in use", and then it can't be freed at all anyways.
I would also add that using "top" as a memory measure isn't a particularly accurate method, and is very unreliable, as it very much depends on what the OS and the runtime library does. Memory belonging to a process may not be "resident", but the process still hasn't freed it - it's just not "present in actual memory" (out in the swap partition instead!)
And to your question "is this defined somewhere", I think it is in the sense that the C/C++ library source cod defines it. But it's not defined in the sense that somewhere it's written that "This is how it's meant to work, and we promise never to hange it". The libraries supplied as glibc and libstdc++ are not going to say that, they will change the internals of malloc, free, new and delete as new technologies and ideas are invented - some may make things better, others may make it worse, for a given scenario.
As has been pointed out in the comments, the memory is not locked to the process. If the kernel feels that the memory is better used for something else [and the kernel is omnipotent here], then it will "steal" the memory from one running process and give it to another. Particularly memory that hasn't been "touched" for a long time.
1 . Both std::vector and std::list are using std::allocator. Even if std::vector is doing batch re-allocations, wouldn't std::allocator be
handing out memory in almost the same arrangement to std::list and
std::vector? This program is single threaded after all.
Well, what are the differences?
std::list allocates nodes one-by-one (each node needs two pointers in addition to your Foo *). Also, it never re-allocates these nodes (this is guaranteed by the iterator invalidation requirements for list). So, the std::allocator will request a sequence of fixed-size chunks from the underlying mechanism (probably malloc which will in turn use the sbrk or mmap system calls). These fixed-size chunks may well be larger than a list node, but if so they'll all be the same default chunk size used by std::allocator.
std::vector allocates a contiguous block of pointers with no book-keeping overhead (that's all in the vector parent object). Every time a push_back would overflow the current allocation, the vector will allocate a new, larger chunk, move everything across to the new chunk, and release the old one. Now, the new chunk will be something like double (or 1.6 times, or whatever) the size of the old one, as is required to keep the amortized constant time guarantee for push_back. So, pretty quickly, I'd expect the sizes it requests to exceed any sensible default chunk size for std::allocator.
So, the the interesting interactions are different: one between between std::vector and the allocator's underlying mechanism, and one between the std::allocator itself and that underlying mechanism.
2 . Where can I learn about the behavior of Linux's memory allocation. I have heard statements about Linux keeping RAM assigned to a process
even after it frees it, but I don't know if that behavior is
guaranteed. Why does using std::vector impact that behavior so much?
There are several levels you might care about:
The container's own allocation pattern: which is hopefully described above
note that in real-world applications, the way a container is used is just as important
std::allocator itself, which may provide a layer of buffering for small allocations
I don't think this is required by the standard, so it's specific to your implementation
The underlying allocator, which depends on your std::allocator implementation (it could for example be malloc, however that is implemented by your libc)
The VM scheme used by the kernel, and its interactions with whatever syscall (3) ultimately uses
In your particular case, I can think of a possible explanation for the vector apparently releasing more memory than the list.
Consider that the vector ends up with a single contiguous allocation, and lots of the Foos will also be allocated contiguously. This means that when you release all this memory, it's pretty easy to figure out that most of the underlying pages are genuinely free.
Now consider that the list node allocations are interleaved 1:1 with the Foo instances. Even if the allocator did some batching, it seems likely that the heap is much more fragmented than in the std::vector case. Therefore, when you release the allocated records, some work would be required to figure out whether an underlying page is now free, and there's no particular reason to expect this will happen (unless a subsequent large allocation encouraged coalescing of heap records).
The answer is the malloc "fastbins" optimization.
std::list creates tiny (less then 64 bytes) allocations and when it frees them up they are not actually freed - but goes to the fastblock pool.
This behavior means that the heap stays fragmented even AFTER the list is cleared and therefore it does not return to the system.
You can either use malloc_trim(128*1024) in order to forcibly clear them.
Or use mallopt(M_MXFAST, 0) in order to disable fastbins altogether.
I find the first solution to be more correct if you call it when you really don't need the memory anymore.
Smaller chunks go through brk and adjusting the data segment and constant splitting and fusion and bigger chunks mmap the process is a little less disturbed. more info (PDF)
also ptmalloc source code.
I have a std::vector of a class called OGLSHAPE.
each shape has a vector of SHAPECONTOUR struct which has a vector of float and a vector of vector of double. it also has a vector of an outline struct which has a vector of float in it.
Initially, my program starts up using 8.7 MB of ram. I noticed that when I started filling these these up, ex adding doubles and floats, the memory got fairly high quickly, then leveled off. When I clear the OGLSHAPE vector, still about 19MB is used. Then if I push about 150 more shapes, then clear those, I'm now using around 19.3MB of ram. I would have thought that logically, if the first time it went from 8.7 to 19, that the next time it would go up to around 30. I'm not sure what it is. I thought it was a memory leak but now I'm not sure. All I do is push numbers into std::vectors, nothing else. So I'd expect to get all my memory back. What could cause this?
Thanks
*edit, okay its memory fragmentation
from allocating lots of small things,
how can that be solved?
Calling std::vector<>::clear() does not necessarily free all allocated memory (it depends on the implementation of the std::vector<>). This is often done for the purpose of optimization to avoid unnessecary memory allocations.
In order to really free the memory held by an instance just do:
template <typename T>
inline void really_free_all_memory(std::vector<T>& to_clear)
{
std::vector<T> v;
v.swap(to_clear);
}
// ...
std::vector<foo> objs;
// ...
// really free instance 'objs'
really_free_all_memory(objs);
which creates a new (empty) instance and swaps it with your vector instance you would like to clear.
Use the correct tools to observe your memory usage, e.g. (on Windows) use Process Explorer and observe Private Bytes. Don't look at Virtual Address Space since that shows the highest memory address in use. Fragmentation is the cause of a big difference between both values.
Also realize that there are a lot of layers in between your application and the operating system:
the std::vector does not necessarily free all memory immediately (see tip of hkaiser)
the C Run Time does not always return all memory to the operating system
the Operating System's Heap routines may not be able to free all memory because it can only free full pages (of 4 KB). If 1 byte of a 4KB page is stil used, the page cannot be freed.
There are a few possible things at play here.
First, the way memory works in most common C and C++ runtime libraries is that once it is allocated to the application from the operating system it is rarely ever given back to the OS. When you free it in your program, the new memory manager keeps it around in case you ask for more memory again. If you do, it gives it back for you for re-use.
The other reason is that vectors themselves typically don't reduce their size, even if you clear() them. They keep the "capacity" that they had at their highest so that it is faster to re-fill them. But if the vector is ever destroyed, that memory will then go back to the runtime library to be allocated again.
So, if you are not destroying your vectors, they may be keeping the memory internally for you. If you are using something in the operating system to view memory usage, it is probably not aware of how much "free" memory is waiting around in the runtime libraries to be used, rather than being given back to the operating system.
The reason your memory usage increases slightly (instead of not at all) is probably because of fragmentation. This is a sort of complicated tangent, but suffice it to say that allocating a lot of small objects can make it harder for the runtime library to find a big chunk when it needs it. In that case, it can't reuse some of the memory it has laying around that you already freed, because it is in lots of small pieces. So it has to go to the OS and request a big piece.
I have a strongly recursive function, that creates a (very small) std::multimap locally for each function instance using new (which recurses to malloc/calloc in the std lib). After some hundred recursions new fails although i am using a native 64Bit application on Windows XP x64. The machine has 10 GB RAM, The application only uses about 1GB. No other big apps are running.
This happens a few minutes after starting the program and starting the recursive function. The recursive function has been called about 150.000 times at this point with a probably max. recursion of some hundreds. The problem occurring is not a stack overflow.
I am using Visual Studio 2005 and the dinkumware STL. The fault occurs in a release build.
EDIT:
Ok, here is some code.
I rearranged the code now and put the map on the stack, but it uses new to initialize - there it fails. I also tried with a std::multimap instead of hash_multimap. All of this die not change the behavior.
int TraceBackSource(CalcParams *CalcData, CKnoObj *theKno, int qualNo,
double maschFak, double partAmount, int MaschLevel, char *MaschID,
double *totalStrFlow, int passNo,
CTraceBackData *ResultData)
{ typedef std::hash_multimap<double, CStrObj *>StrFMap;
StrFMap thePipes;
for(...)
{
...
thePipes.insert(std::make_pair(thisFlow, theStr));
}
// max. 5 elements in "thePipes"
for(StrFMap::iterator it = thePipes.begin(); it != thePipes.end(); it++)
{
...
try
{
TraceBackSource(CalcData, otherKno, qualNo, maschFak * nodeFak, nodeAmount, SubMaschlevel, newMaschID, totalStrFlow, passNo, ResultData);
}
catch(std::exception &it)
{
Trace(0, "*** Exception, %s", it.what());
return 0;
}
return 0;
}
}
Interestingly, the first failure runs into the catch handler, quite a bit later on i end with a ACCESS VIOLATION and a corrupted stack.
The amount of RAM on your machine and the other processes running are irrelevant for this particular scenario. Every process has the same amount of virtual address space assigned to it. The size of this space is irrespective of the amount of RAM on your machine or other processes running.
What's happening here is likely one of the following
You've simply allocated too much memory. Hard to do in 64 bit yes but possible
There is no contiguous block of memory available which has the requested size.
Your number suggests an easily defaulted 1MB stacks size (c150K x 8 ). So from a quick look at your code (and that map::insert especially and not providing the for'...' code ) you are running into an interaction with stackoverflow.com :)
You are probably hitting it for the OS you're running it on. On Windows use the VS linker setttings or use editbin.exe or some exotic unportable api, triple your stack size and see whether it significantly changes the observed recursive count at time of exception.
Your application is probably suffering from memory fragmentation. There might be plenty of memory available, but it may be fragmented into much smaller contiguous blocks than your application asks for.
As Majkara mentions, the thread stack space is a fixed size, and you are running out of it - it doesn't matter how much memory you have free. You need to rewrite your algorithm to be iterative using a stl::stack allocated on the heap (or some other data structure) to keep track of the depth.