all!
I am trying to load in memory a set of files. for each file, around 10000 entries are loaded.
it should be totaly possible to hold the whole information is memory (i calculated the size in Mb, should fit), however, at some point I always get bad_alloc exception from the vector where i try to store the entries.
First question is, what is the limit of memory that can be allocated using vector? the number of elements that are allocated before the exception is not even close to the max_size()
Second question is, which kind of structure in stl or boost can I use to load the whole set in memory?
I am greatfull for any help!
Regardless of what your code actually does and what environment you're running this on, one thing is certain: std::vector allocates continuous storage. This means that due to address space (memory?) fragmentation you will get this result, because there is just no room to allocate everything continuously.
If you see that this is happening, either use a non-continuous container (like std::list) or make sure you only load chunks into memory at a time, not the whole thing.
Related
The problem is following: I have certain amount of words (let's say 20M), each containing some bits used as flags; all stored in single continuous binary file.
What I would like to do is to get access to those words in container like style, so container_instance[i] allows me to access i-th word. To get things more complicated, I cannot store all words in memory at one time, they have to be stored back to file and memory freed for those not used for long period. To simplify things the whole sequence is partitioned to 1K fragments, so we need to free and allocate such 1K blocks. Memory should be freed after some time or after certain number of times container have been accessed.
Thread safety in nice to have. But I can protect externally.
The implementation I have currently only allocates blocks on demand (empty or read from file if they are available; file is not sparse, so everything after the last byte in file is allocated empty) and it is not nicely done. Not frees at all, so unused blocks remain in memory forever.
I started to think about nice looking solution and I would like to know whether any elements from STL or Boosts can help me build such container not by engraving it step by step from scratch?
I am not expecting full solutions, rather pointing "you can use that for that".
You can use mmap system call to map your file into memory. You can use pointer arithmetic with that buffer, so access by index is not a trouble.
Mapped pages are virutual and managed by the kernel, allowing to save unused memory blocks and load/flush them at transparently to you. Also, using madvise probably can enable some optimisations.
I ran into a funny issue today working with large data structures. I initially was using a vector to store upwards of 1000000 ints but later decided I didn't actually need the dynamic functionality of the vector (I was reserving 1000000 spots as soon as it was declared anyway) and it would be beneficial to, instead, be able to add values any place in the data structure. So I switched it to an array and BAM stack overflow. I'm guessing this is because declaring the size of the array at compile time puts it in the stack and making use of a dynamic vector instead placed it on the heap (which I'm guessing is larger?).
So what's the right answer here? Move back to a dynamic memory system just so it gets put on the heap? Increase the size of the stack? Or am I way off base on the whole thing here...?
Thanks!
I initially was using a vector to store upwards of 1000000 ints
Good idea.
but later decided I didn't actually need the dynamic functionality of the vector (I was reserving 1000000 spots as soon as it was declared anyway)
Not such a good idea. You did need it.
and it would be beneficial to, instead, be able to add values any place in the data structure.
I don't follow.
I'm guessing this is because declaring the size of the array at compile time puts it in the stack and making use of a dynamic vector instead placed it on the heap (which I'm guessing is larger?).
Much. The call stack is typically of the order of 1MB-2MB in size by default. Your "heap" (free store) is only really bounded by your available RAM.
So what's the right answer here? Move back to a dynamic memory system just so it gets put on the heap?
Yes.
[edit: Joachim's right — static is another possible answer.]
Increase the size of the stack?
You could but even if you could stretch 4MB out of it, you've left yourself no wiggle room for other local data variables. Best use dynamic memory — that's the appropriate thing to do.
Or am I way off base on the whole thing here...?
No.
SUMMARY:
I have an application which consumes way more memory that it should (roughly about 250% of the expected amount) but I can't seem to find any memory leaks. Calling the same function (which does a lot of allocations) will keep increasing memory usage to some point and then it will not change and stay there.
PROGRAM DETAILS:
The application uses a quadtree data structure to store 'Points'. It is possible to specify the maximum number of points to be stored in memory (cache size). The 'Points' are stored in 'PointBuckets' (arrays of points linked to the leaf nodes of the quadtree) which, if the maximum total number of points in the quadtree is reached, are serialized and saved to temporary files, to be retrieved when needed. This all seems to work fine.
Now when a file is loaded a new Quadtree is created and the old one is deleted if it exists, then points are read from the file and inserted into the quadtree one by one. A lot of memory allocations take place as buckets are being created and deleted during node splitting etc.
SYMPTOMS:
If I load a file that is expected to use 300MB of memory once, I get the expected amount of memory consumed. All good. If I keep loading the same file over and over again the memory usage keeps growing (I'm looking at the RES column in top, Linux) till about 700MB. That could indicate a memory leak. However if I then keep loading the files still, memory consumption just stays at 700MB.
Another thing: When I use valgrind massif and look at the memory usage it always stays within expected limit. For example if I specify cache size to be 1.5 GB and run my program alone, it will eventually consume 4GB of memory. If I run it in massif, it will stay below 2GB for all the time and then in the produced graphs I'll be able to see that it in fact never allocated more then the expected 1.5GB. My naive assumption is that this happens because massif uses a custom memory pool which somehow prevents fragmentation.
So what do you think is going on here? What kind of solution should I look for to solve this issue, if it is memory fragmentation?
I'd put it more at simple allocator and OS caching behaviours. They retain memory you allocated instead of freeing it so that it can be returned to you in a more prompt fashion the next time you request it. However, 250% does sound like a lot for this kind of effect- you could be looking at fragmentation problems.
Try swapping your allocator for a fragmentation-free allocator like object pool or memory arena.
For the program I'm working on, I frequently need to read input from a text file which contains hundreds of thousands of integers. For the time being, I'm reading a handful of values and storing them in a vector. Whenever a value I need is not in the vector, I read from the input file again and flush out the old values to make room for the values I'm currently reading in.
I'd like to avoid a situation where I constantly need to read from the input file and I'm wondering how many values I can store in my vector before there will be a problem. max_size() returns 1073741823, so I'm thinking that I can store that many elements but I'm wondering where that memory is being used and if it's a good idea to have a vector that large.
When you create a vector as so:
int main(){
std::vector<int> vec;
vec.push_back(3);
vec.push_back(4);
return 0;
}
Is that vector now using stack memory? Since your vector contains 2 ints, does that mean that 8 bytes of stack memory is being used?
According to MSDN docs:
For x86 and x64 machines, the default stack size is 1 MB.
That does not seem like a lot of memory. What is an example of a situation where you would want to increase the stack memory? Is there any way in Visual Studio to monitor exactly how much stack and heap memory are currently being used?
Is there anything I can do to prevent constant reading from the input file in a situation like this?
Is that vector now using stack memory?
The vec object is on the stack, but it internally allocates its memory on the heap as it grows
EDIT
Also, instead of reading all the file and storing it in a vector, you could try using a memory mapped file. From what I understand (not having used them myself), you would benefit from page caching and file reading in kernel mode (as the OS will manage the loading of the file on demand).
Note that this is merely a suggestion on where to pursue your investigation (I think that it might be appropriate, but I am not familiar enough with memory mapped files to tell you more)
vector stores elements in the heap, not the stack. Whether you should really allocate that much heap memory is a different matter, but you won't blow your stack.
Developing a 32-bit C++/carbon app under OS X Snow Leopard, ran into a problem where an stl vector of approximately 20,000 small objects (72 bytes each) was failing during a reallocation. Seems the vector, which was several megabytes in size, couldn't expand to a contiguous piece of memory, which at the point of failure was only 1.2 MB in size.
GuardMalloc[Appname-33692]: *** mmap(size=2097152) failed (error code=12)
*** error: can't allocate region
GuardMalloc[Appname-35026]: Failed to VM allocate 894752 bytes
GuardMalloc[ Appname-35026]: Explicitly trapping into debugger!!!
#0 0x00a30da8 in GMmalloc_zone_malloc_internal
#1 0x00a31710 in GMmalloc
#2 0x94a54617 in operator new
#3 0x0026f1d3 in __gnu_cxx::new_allocator<DataRecord>::allocate at new_allocator.h:88
#4 0x0026f1f8 in std::_Vector_base<DataRecord, std::allocator<DataRecord> >::_M_allocate at stl_vector.h:117
#5 0x0026f373 in std::vector<DataRecord, std::allocator<DataRecord> >::_M_insert_aux at vector.tcc:275
#6 0x0026f5a6 in std::vector<DataRecord, std::allocator<DataRecord> >::push_back at stl_vector.h:610
I can think of several strategies:
1) Reserve() a really, really big vector as soon as the app launches. However, this assumes the user might not load additional files that contribute to this vector, pushing it beyond the pre-allocated limit and possibly getting back into the same situation.
2) Change the vector of objects/memory allocations into a vector of pointers to objects/memory allocations. Clearly makes the vector itself a more manageable size, but then creates 20,000 small objects (which could eventually become like 50,000 objects, depending on what additional files the user loads). Does this create a gigantic overhead problem?
3) Change from a vector to a list, which may have its own overhead issues.
The vector is being constantly iterated through, and generally only appended to.
Any sage thoughts on these issues?
===============
ADDITIONAL NOTE: this particular vector just holds all imported record, so they can be indexed and sorted by ANOTHER vector that contains a sort order. Once an item is put into this vector, it stays there for the lifetime of the app (also helps support undo operations by making sure the index into the vector always remains the same for that particular object).
I think a std::deque would be more suitable than a std::list or a std::vector in your case. std::list is not efficient in iteration or random indexing, while std::vector is slow to resize (as you have observed). A std::deque does not need large amounts of memory when resizing, at the cost of slightly slower random indexing than a vector.
Don't use continuous storage like a vector. Go for a deque or list and the reallocations will not fail anymore.
If you really need high performance, consider writing your own container (ie ArrayList).
Out of your three options, 1 doesn't seem like a guaranteed solution, while 2 adds complexity and the vector still has to grow.
Option 3 seems somewhat reasonable (or possibly use a deque as mentioned in another answer) because while it's semantically similar to option 2, it provides a more normalized method of allocating a new data object. Of course this assumes that you only append data and don't need random access.
However all that said I find it incredible that your program has fragmented memory so badly that on reasonably modern hardware it can't allocate 1.2MB of memory. Far more likely is that there's some undefined behavior lurking (or possibly a memory leak) in your program causing it to behave in this way, failing to allocate the memory. You could use valgrind to help hunt down what may be going on. Does the same problem happen when you use the builtin new and delete rather than GMalloc?
Is your program being ulimited to only have access to a small amount of memory?
Finally, if valgrind finds nothing and your program really is fragmenting memory horribly, I would consider stepping back and reconsidering your approach. You may want to evaluate an alternate approach that doesn't rely on millions(?) of allocations (I just can't see a small number of allocations fragmenting the heap that much).
if even in the heap, there is not enough contigous space, use deque;
deque allocate not contigous space when it is needed. so it could handle the limit of 1.2 MB
deque is made of some number of blocks of memory not only one contigous space. that s why it could work. but it is not sure(/totaly safe) because you dont control the behaviour of the deque.
see thisabout memory fragmentation (following is copy/paste from the web page):
http://www.design-reuse.com/articles/25090/dynamic-memory-allocation-fragmentation-c.html :
Memory Fragmentation
The best way to understand memory fragmentation is to look at an example. For this example, it is assumed hat there is a 10K heap. First, an area of 3K is requested, thus:
#define K (1024)
char *p1;
p1 = malloc(3*K);
Then, a further 4K is requested:
p2 = malloc(4*K);
3K of memory is now free.
Some time later, the first memory allocation, pointed to by p1, is de-allocated:
free(p1);
This leaves 6K of memory free in two 3K chunks. A further request for a 4K allocation is issued:
p1 = malloc(4*K);
This results in a failure – NULL is returned into p1 – because, even though 6K of memory is available, there is not a 4K contiguous block available. This is memory fragmentation.
this is an issue even for kernels using virtual memory such as osx.