C++ Array size x86 and for x64

C++ Array size x86 and for x64 - c++

Simple question, I'm writting a program that needs to open huge image files (8kx8k) but I'm a little bit confused on how to initialize the huge arrays to hold the images in c++.
I been trying something like this:
long long SIZE = 8092*8092; ///8096*8096
double* array;
array = (double*) malloc(sizeof(double) * SIZE);
if (array == NULL)
{
fprintf(stderr,"Could not allocate that much memory");
}
But sometimes my NULL check does not catch that the array was not initialized, any idea why?
Also I can't initialize more that 2 or 3 arrays, even when running in a x64 machine with 12 GB of RAM, any idea why?
I would really wish not to have to work with sections of array instead. Any help is welcome.
Thanks.

You're not running into an array size problem. 8K*8K is merely 64M. Even 64M doubles (sizeof==8) are not an issue; that would require a mere 512 MB. Now, a 32 bit application (no matter where it's running) should be able to allocate a few of them. Not 8, because the OS typically needs to reserve some space for itself (often slightly over 2GB) and sometimes not even 3 when memory is fragmented.
The behavior of "malloc failed but didn't return NULL" is a Linux configuration bug, fixed by # echo 2 > /proc/sys/vm/overcommit_memory

malloc() does not initialize memory, it just reserves it. You will have to initialize it explicitly, e.g. via memset() from string.h:
array = (double*) malloc(SIZE * sizeof(double));
if (array) memset(array, 0, SIZE * sizeof(double));
However, in C++ you should use new instead of malloc:
double* array = new double[SIZE];
if (!array) {
cerr << "Could not allocate that much memory" << endl;
}
for (int i=0; i<SIZE; i++) array[i] = 0.0;
Regarding size: each such array is 512 MB. Are you positively sure you need double precision (which means the image has 64-bit pixel depth)? Maybe a float would suffice? That would halve the memory footprint.

You might be running into a 2GB per-process address space limit if you are running a 32bit operating system. With a few hundred MBs of system libs and other stuff, and 2 or 3 arrays of 512MB each, that will give 2GB easily. A 64bit OS would help you there.

Are you compiling your application as a 32-bit application (the default in Visual Studio, if that's what you're using), or as a 64-bit application? You shouldn't have troubles if you build it as a 64-bit app.
malloc allocates (reserves memory and returns a pointer), calloc initializes (writes all zeros to that memory).

Seems to be that you have no continuous memory block of such size (~500Mb) in C runtime heap. Instead of copying file into memory try to map image into a processes address space. You could map only necessary parts of the file.

Just as a side note: although you don't want to bother about the whole image not being in memory at once, there are reasons not to do it. Maybe think about an abstraction that allows you to keep only the currently needed chunk in memory. The program code then can be written as though ignorant of the memory issues.

I would really wish not to have to work with sections of array instead. Any help is welcome.
Have you looked into memory-mapped files?

Yep, sounds a lot like heap fragmentation, as Kirill pointed out. See also: How to avoid heap fragmentation?

i suggest using compression. decompress part of it which you need to process in your code whenever, and compress it after the part done.
2nd proposal: write code to overload memory pointer "operator+" and "operator-" so you could use non-continuous memory buffers. use smaller memory buffers make your code more stable than a continuous larger one. i had experienced it and had written some operator-overloading, see http://code.google.com/p/effoaddon/source/browse/trunk/devel/effo/codebase/addons/mem/include/mcur_i.h for the example. when i test 47G malloc()ed system memory on a x86_64, i allocated just 1G per malloc() call, so i allocated 47 memory blocks in total. EDIT: while if i tried to allocate as much as possible by using just one malloc(), i would only get 30G on a 48G system, say less than 70%, that's because larger buffer per malloc() requested, much more managemental memory consumed by the system/libc itself, you know, I called mlock() to prevent the allocated memory from being swapped out to the disk.
3rd one: try posix file mapping, map to memory per image.
Btw: call malloc() is more stable than new() though writing c++, because when memory got stressed, new() is prone to trow exceptions instead of returning NULL.

Related

Memory use keep increasing in for loop while using dynamic array.(C++)

The following is my C++ code.
I found the memory use will keep increasing if I try to use test1 array to calculate anything.
double **test1;
test1=new double *[1000];
for(int i=0;i<1000;i++){
test1[i]=new double [100000000];
test1[i][0]=rand() / (double)RAND_MAX*100;
}
for(int j=1;j<100000000;j++){
for(int i=0;i<1000;i++){
test1[i][j]=test1[i][j-1]; //this cause memory use increase.
}
}
If I delete the line.
test1[i][j]=test1[i][j-1];
The memory use will become a small constant value.
I thought I have already declare the dynamic array at the first part, the memory use should be a constant if I didn't new any array.
What cause the memory use increasing? And how do I avoid it?
(I use linux command "top" to monitor the memory use.)

In the first loop you create 100,000,000 doubles, which is 800 MB of allocation. You write to the first one only.
Later you write to the rest. When you do this, the operating system needs to actually give you the memory to write into, whereas initially it just gave you a mapping which would page fault later (when you write to it).
So basically, since each allocation is so large, the memory required to back it is not physically allocated until it is used.
The code is nonsensical, because eventually you try to write to 800 GB of memory. I doubt that will ever complete on a typical computer.

On a virtual memory system, the Linux kernel will (by default) not actually allocate any physical memory when your program does an allocation. Instead it will just adjust your virtual address space size.
Think of it like the kernel going "hmm, yeah, you say you want this much memory. I'll remember I promised you that, but let's see if you are really going to use it before I go fetch it for you".
If you then actually go and write to the memory, the kernel will get a page fault for the virtual address that that is not actually backed by real memory and at that point it will go and allocate some real memory to back the page you wrote to.
Many programs never write to all the memory they allocate, so by only fulfilling the promise when it really has to, the kernel saves huge amounts of memory.
You can see the difference between the amount you have allocated and the amount that is actually occupying real memory bu looking at the VSS (Virtual Set Size) and RSS (Resident Set Size) columns in the output of ps aux.
If you want all allocations to be backed by physical memory all the time (you probably do not), then you can change the kernels overcommit policy via the vm.overcommit_memory sysctl switch.

Allocate several GBs of memory for std::vector

I need to acquire several GB of data from a sensor. When I tried to allocate a big array with malloc (10 or more GB. My system has 32GB) it returns NULL. So I thought the problem could be solved with a linked list of iterators to vectors.
However I don't know how to set this up. I tried declaring "
list< vector::iterator >" but I can't allocate the memory for each vector (e/o should have 1000~2000 elements). Do you know any way to do this or maybe a better solution for this big memory allocation?

If you are using a 64-bit operating system, then malloc should be able to allocate the large size with no problem.
For example, this code runs on my windows machine (64-bit windows) and allocates 10GB of ram flawlessly:
#include <stdio.h>
#include <malloc.h>
int main(int argc, char *argv[]) {
long int size = 10L * 1024 * 1024 * 1024;
printf("size = %ld\n", size);
char *x = (char *)malloc(size);
printf("x = 0x%lx\n", x);
long int i;
for (i = 0; i < size; i += 1024*1024) {
x[i] = 'h';
}
printf("Done1\n");
}
However, if you have a 32-bit operating system, you'll be in trouble, and can't allocate over some limit (maybe 3 GB, but probably system dependent)
In that case, you'll need to write your data to a file instead.
However, if you're using a fat filesystem, then you can't write to a file that big either. In that case, you'd have to split the data among many files under 2gb in size.
You'd want to actually check the malloc result for NULL to make sure the malloc works and memory could be grabbed.

You will need to allocation this space under Windows 64 bit OS. You will ALSO have to set "large address space aware" flag, otherwise you can only get 2 GB of RAM due to how the virtual memory system works on Windows.
You may want to look into using a memory mapped file, as suggested by sehe in his answer if you do not absolutely have to have one large 10 GB chunk of continuous memory. If you have to build your application for Windows 32 bit, then this will be the only answer, as Windows 32 bit normally only allows for 2 GB of memory, unless the option is set for "large address space aware" flag, at which point it will allow 3 GB of memory usage.

When you have to deal with large blocks of memory, you are better off skipping malloc altogether and going directly to the operating system calls for memory allocation.

I usually move to memory mapped files or shared memory maps for this kind of data volumes.
This way, you're not bound to the amount of physical (process) memory available at all. You can let the OS page in and out as required. Fragmentation becomes much less of an issue (unless you actually fragment the logical address space, which is quite hard to achieve on 64 bit architectures).
More information
I have quite a number of answers on SO that show examples of storing vectors and all manner of more complicated data structures in shared memory/mapped files. You might want to look for mapped_file_device (from Boost Iostreams) or managed_shared_memory and managed_mapped_file (from Boost Interprocess)

Allocating more memory than there exists using malloc

This code snippet will allocate 2Gb every time it reads the letter 'u' from stdin, and will initialize all the allocated chars once it reads 'a'.
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#define bytes 2147483648
using namespace std;
int main()
{
char input [1];
vector<char *> activate;
while(input[0] != 'q')
{
gets (input);
if(input[0] == 'u')
{
char *m = (char*)malloc(bytes);
if(m == NULL) cout << "cant allocate mem" << endl;
else cout << "ok" << endl;
activate.push_back(m);
}
else if(input[0] == 'a')
{
for(int x = 0; x < activate.size(); x++)
{
char *m;
m = activate[x];
for(unsigned x = 0; x < bytes; x++)
{
m[x] = 'a';
}
}
}
}
return 0;
}
I am running this code on a linux virtual machine that has 3Gb of ram. While monitoring the system resource usage using the htop tool, I have realized that the malloc operation is not reflected on the resources.
For example when I input 'u' only once(i.e. allocate 2GB of heap memory), I don't see the memory usage increasing by 2GB in htop. It is only when I input 'a'(i.e. initialize), I see the memory usage increasing.
As a consequence, I am able to "malloc" more heap memory than there exists. For example, I can malloc 6GB(which is more than my ram and swap memory) and malloc would allow it(i.e. NULL is not returned by malloc). But when I try to initialize the allocated memory, I can see the memory and swap memory filling up till the process is killed.
-My questions:
1.Is this a kernel bug?
2.Can someone explain to me why this behavior is allowed?

It is called memory overcommit. You can disable it by running as root:
echo 2 > /proc/sys/vm/overcommit_memory
and it is not a kernel feature that I like (so I always disable it). See malloc(3) and mmap(2) and proc(5)
NB: echo 0 instead of echo 2 often -but not always- works also. Read the docs (in particular proc man page that I just linked to).

from man malloc (online here):
By default, Linux follows an optimistic memory allocation strategy.
This means that when malloc() returns non-NULL there is no guarantee
that the memory really is available.
So when you just want to allocate too much, it "lies" to you, when you want to use the allocated memory, it will try to find enough memory for you and it might crash if it can't find enough memory.

No, this is not a kernel bug. You have discovered something known as late paging (or overcommit).
Until you write a byte to the address allocated with malloc (...) the kernel does little more than "reserve" the address range. This really depends on the implementation of your memory allocator and operating system of course, but most good ones do not incur the majority of kernel overhead until the memory is first used.
The hoard allocator is one big offender that comes to mind immediately, through extensive testing I have found it almost never takes advantage of a kernel that supports late paging. You can always mitigate the effects of late paging in any allocator if you zero-fill the entire memory range immediately after allocation.
Real-time operating systems like VxWorks will never allow this behavior because late paging introduces serious latency. Technically, all it does is put the latency off until a later indeterminate time.
For a more detailed discussion, you may be interested to see how IBM's AIX operating system handles page allocation and overcommitment.

This is a result of what Basile mentioned, over commit memory. However, the explanation kind of interesting.
Basically when you attempt to map additional memory in Linux (POSIX?), the kernel will just reserve it, and will only actually end up using it if your application accesses one of the reserved pages. This allows multiple applications to reserve more than the actual total amount of ram / swap.
This is desirable behavior on most Linux environments unless you've got a real-time OS or something where you know exactly who will need what resources, when and why.
Otherwise somebody could come along, malloc up all the ram (without actually doing anything with it) and OOM your apps.
Another example of this lazy allocation is mmap(), where you have a virtual map that the file you're mapping can fit inside - but you only have a small amount of real memory dedicated to the effort. This allows you to mmap() huge files (larger than your available RAM), and use them like normal file handles which is nifty)
-n

Initializing / working with the memory should work:
memset(m, 0, bytes);
Also you could use calloc that not only allocates memory but also fills it with zeros for you:
char* m = (char*) calloc(1, bytes);

1.Is this a kernel bug?
No.
2.Can someone explain to me why this behavior is allowed?
There are a few reasons:
Mitigate need to know eventual memory requirement - it's often convenient to have an application be able to an amount of memory that it considers an upper limit on the need it might actually have. For example, if it's preparing some kind of report either of an initial pass just to calculate the eventual size of the report or a realloc() of successively larger areas (with the risk of having to copy) may significantly complicate the code and hurt performance, where-as multiplying some maximum length of each entry by the number of entries could be very quick and easy. If you know virtual memory is relatively plentiful as far as your application's needs are concerned, then making a larger allocation of virtual address space is very cheap.
Sparse data - if you have the virtual address space spare, being able to have a sparse array and use direct indexing, or allocate a hash table with generous capacity() to size() ratio, can lead to a very high performance system. Both work best (in the sense of having low overheads/waste and efficient use of memory caches) when the data element size is a multiple of the memory paging size, or failing that much larger or a small integral fraction thereof.
Resource sharing - consider an ISP offering a "1 giga-bit per second" connection to 1000 consumers in a building - they know that if all the consumers use it simultaneously they'll get about 1 mega-bit, but rely on their real-world experience that, though people ask for 1 giga-bit and want a good fraction of it at specific times, there's inevitably some lower maximum and much lower average for concurrent usage. The same insight applied to memory allows operating systems to support more applications than they otherwise would, with reasonable average success at satisfying expectations. Much as the shared Internet connection degrades in speed as more users make simultaneous demands, paging from swap memory on disk may kick in and reduce performance. But unlike an internet connection, there's a limit to the swap memory, and if all the apps really do try to use the memory concurrently such that that limit's exceeded, some will start getting signals/interrupts/traps reporting memory exhaustion. Summarily, with this memory overcommit behaviour enabled, simply checking malloc()/new returned a non-NULL pointer is not sufficient to guarantee the physical memory is actually available, and the program may still receive a signal later as it attempts to use the memory.

Create too large array in C++, how to solve?

Recently, I work in C++ and I have to create a array[60.000][60.000]. However, i cannot create this array because it's too large. I tried float **array or even static float array but nothing is good. Does anyone have an ideas?
Thanks for your helps!

A matrix of size 60,000 x 60,000 has 3,600,000,000 elements.
You're using type float so it becomes:
60,000 x 60,000 * 4 bytes = 14,400,000,000 bytes ~= 13.4 GB
Do you even have that much memory in your machine?
Note that the issue of stack vs heap doesn't even matter unless you have enough memory to begin with.
Here's a list of possible problems:
You don't have enough memory.
If the matrix is declared globally, you'll exceed the maximum size of the binary.
If the matrix is declared as a local array, then you will blow your stack.
If you're compiling for 32-bit, you have far exceeded the 2GB/4GB addressing limit.

Does "60.000" actually mean "60000"? If so, the size of the required memory is 60000 * 60000 * sizeof(float), which is roughly 13.4 GB. A typical 32-bit process is limited to only 2 GB, so it is clear why it doesn't fit.
On the other hand, I don't see why you shouldn't be able to fit that into a 64-bit process, assuming your machine has enough RAM.

Allocate the memory at runtime -- consider using a memory mapped file as the backing. Like everyone says, 14 gigs is a lot of memory. But it's not unreasonable to find a computer with 14GB of memory, nor is it unreasonable to page the memory as necessary.
With a matrix of this size, you will likely become very curious about memory access performance. Remember to consider the cache grain of your target architecture and if your target has a TLB you may be able to use larger pages to relieve some TLB pressure. Then again, if you don't have enough memory you'll likely care only about how fast your storage I/O is.
If it's not already obvious, you'll need an architecture that supports a 64-bit address space in order to access this memory directly/conveniently.

To initialise the 2D array of floats that you want, you will need:
60000 * 60000 * 4 bytes = 14400000000 bytes
Which is approximately 14GB of memory. That's a LOT of memory. To even hold that theoretically, you will need to be running a 64bit machine, not to mention one with quite a bit of RAM installed.
Furthermore, allocating this much memory is almost never necessary in most situations, are you sure no optimisations could be made here?
EDIT:
In light of new information from your comments on other answers: You only have 4GB memory (RAM). Your operating system is hence going to have to page at least 9GB on the Hard Drive, in reality probably more. But you also only have 20GB of Hard Drive space. This is barely enough to page all that data, especially if the disk is fragmented. Finally, (I could be wrong because you haven't stated explicitly) it is quite possible that you're running a 32bit machine. This isn't really capable of handling more than 4GB of memory at a time.

I had this problem too. I did a workaround where I chopped the array into sections (my biggest allowed array was float A_sub_matrix_20[62944560]). When I declared just one of these in main(), it seems to be put in RAM as I got a runtime exception as soon as main() starts. I was able to declare 20 buffers of that size as global variables which works (looks like in global form they are stored on the HDD - when I added A_sub_matrix_20[n] to the watch list in VisualStudio it gave a message "reading from file").

Why is deleted memory unable to be reused

I am using C++ on Windows 7 with MSVC 9.0, and have also been able to test and reproduce on Windows XP SP3 with MSVC 9.0.
If I allocate 1 GB of 0.5 MB sized objects, when I delete them, everything is ok and behaves as expected. However if I allocate 1 GB of 0.25 MB sized objects when I delete them, the memory remains reserved (yellow in Address Space Monitor) and from then on will only be able to be used for allocations smaller than 0.25 MB.
This simple code will let you test both scenarios by changing which struct is typedef'd. After it has allocated and deleted the structs it will then allocate 1 GB of 1 MB char buffers to see if the char buffers will use the memory that the structs once occupied.
struct HalfMegStruct
{
HalfMegStruct():m_Next(0){}
/* return the number of objects needed to allocate one gig */
static int getIterations(){ return 2048; }
int m_Data[131071];
HalfMegStruct* m_Next;
};
struct QuarterMegStruct
{
QuarterMegStruct():m_Next(0){}
/* return the number of objects needed to allocate one gig */
static int getIterations(){ return 4096; }
int m_Data[65535];
QuarterMegStruct* m_Next;
};
// which struct to use
typedef QuarterMegStruct UseType;
int main()
{
UseType* first = new UseType;
UseType* current = first;
for ( int i = 0; i < UseType::getIterations(); ++i )
current = current->m_Next = new UseType;
while ( first->m_Next )
{
UseType* temp = first->m_Next;
delete first;
first = temp;
}
delete first;
for ( unsigned int i = 0; i < 1024; ++i )
// one meg buffer, i'm aware this is a leak but its for illustrative purposes.
new char[ 1048576 ];
return 0;
}
Below you can see my results from within Address Space Monitor. Let me stress that the only difference between these two end results is the size of the structs being allocated up to the 1 GB marker.
This seems like quite a serious problem to me, and one that many people could be suffering from and not even know it.
So is this by design or should this be considered a bug?
Can I make small deleted objects actually be free for use by larger allocations?
And more out of curiosity, does a Mac or a Linux machine suffer from the same problem?

I cannot positively state this is the case, but this does look like memory fragmentation (in one of its many forms). The allocator (malloc) might be keeping buckets of different sizes to enable fast allocation, after you release the memory, instead of directly giving it back to the OS it is keeping the buckets so that later allocations of the same size can be processed from the same memory. If this is the case, the memory would be available for further allocations of the same size.
This type of optimization, is usually disabled for big objects, as it requires reserving memory even if not in use. If the threshold is somewhere between your two sizes, that would explain the behavior.
Note that while you might see this as weird, in most programs (not test, but real life) the memory usage patterns are repeated: if you asked for 100k blocks once, it more often than not is the case that you will do it again. And keeping the memory reserved can improve performance and actually reduce fragmentation that would come from all requests being granted from the same bucket.
You can, if you want to invest some time, learn how your allocator works by analyzing the behavior. Write some tests, that will acquire size X, release it, then acquire size Y and then show the memory usage. Fix the value of X and play with Y. If the requests for both sizes are granted from the same buckets, you will not have reserved/unused memory (image on the left), while when the sizes are granted from different buckets you will see the effect on the image on the right.
I don't usually code for windows, and I don't even have Windows 7, so I cannot positively state that this is the case, but it does look like it.

I can confirm the same behaviour with g++ 4.4.0 under Windows 7, so it's not in the compiler. In fact, the program fails when getIterations() returns 3590 or more -- do you get the same cutoff? This looks like a bug in Windows system memory allocation. It's all very well for knowledgeable souls to talk about memory fragmentation, but everything got deleted here, so the observed behaviour definitely shouldn't happen.

Using your code I performed your test and got the same result. I suspect that David Rodríguez is right in this case.
I ran the test and had the same result as you. It seems there might be this "bucket" behaviour going on.
I tried two different tests too. Instead of allocating 1GB of data using 1MB buffers I allocated the same way as the memory was first allocated after deleting. The second test I allocated the half meg buffer cleaned up then allocated the quater meg buffer, adding up to 512MB for each. Both tests had the same memory result in the end, only 512 is allocated an no large chunk of reserved memory.
As David mentions, most applications tend to make allocation of the same size. One can see quite clearly why this could be a problem though.
Perhaps the solution to this is that if you are allocating many smaller objects in this way you would be better to allocate a large block of memory and manage it yourself. Then when you're done free the large block.

I spoke with some authorities on the subject (Greg, if you're out there, say hi ;D) and can confirm that what David is saying is basically right.
As the heap grows in the first pass of allocating ~0.25MB objects, the heap is reserving and committing memory. As the heap shrinks in the delete pass, it decommits at some pace but does not necessarily release the virtual address ranges it reserved in the allocation pass. In the last allocation pass, the 1MB allocations are bypassing the heap due to their size and thus begin to compete with the heap for VA.
Note that the heap is reserving the VA, not keeping it committed. VirtualAlloc and VirtualFree can help explain the different if you're curious. This fact doesn't solve the problem you ran into, which is that the process ran out of virtual address space.

This is a side-effect of the Low-Fragmentation Heap.
http://msdn.microsoft.com/en-us/library/aa366750(v=vs.85).aspx
You should try disabling it to see if that helps. Run against both GetProcessHeap and the CRT heap (and any other heaps you may have created).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js