Some problem about the initialization of the vector - c++

I can initialize the vector with 10^8,but I can't initialize it with 10^9.Why?
vector<int> bucket;
bucket.resize(100000000); √
bucket.resize(1000000000); ×

It's because resize function will apply memory from heap. As you can figure that, the size will be 4000000000 bytes in your second resize operation, which is larger than the space your system could allocate(may be your computer couldn't find a piece of continuous space for you), and will cause exception and failure.
The maximum memory you can apply for depends on many reasons as follow:
the hardware limitation of physical memory.
the os bit(32 or 64)
memory left for user. Operating system should meet the kernel's need first. Generally speaking, windows kernel needs more memory than linux or unix.
..
In a word, it is hard to know accurate memory size you can use, because it's a dynamic value. But you can make a rough estimation by new operator, and here is a good reference.

C++ vectors allocate memory in a contiguous block and it is likely that the operating system cannot find such a block when the block size gets too large.
Would the error message you are getting indicate that you are running out of memory?
The point is: Even if you think that you have enough memory left on your system, if your program's address space can not accommodate the large block in one chunk then you cannot construct the large vector (the maximum address space size may differ for 32-bit and 64-bit programs).

Related

Allocating aligned memory for larger arrays

In my program I want to allocate 32 byte aligned memory to use SSE/AVX. The amount I want to allocate is somewhere around 2000*1300*17*17*4(large data set). I tried using functions _aligned_malloc() and _mm_malloc but for larger sizes it doesn't allocate memory and results in a access violation exception. If the amount allocated is small like around 512*320*4*17*17(small data set) then the code work fine.
Here these functions return a null pointer when allocation is done for large data set.But works fine when input data size is small. Also here if I just use unaligned memory allocation using new then code works fine for large data set too.
Finally Can someone tell me Is there any significant performance gains in using aligned memory for AVX.
Edit: After some research according to this post it says that new allocate memory from free store and malloc() allocate memory from heap. Here I am exceeding maximum heap size as _aligned_malloc() return errno 12 which means ENOMEM in that case Can someone tell me a work around for this.
On memory allocation:
I seems you are actually trying to alocate 2000*1300*17*17*4 32 bytes elements. This is means you are trying to allocate 96 GB while your system has only 12 GB memory.
Since new is working but malloc not it seems your local implementation of new seems to be able to allocate huge amounts of virtual memory. Malloc allocates from the heap which means it is usally limited to the physical amount of memory you've got. That's the reason it fails.
As the dataset is bigger than your main memory you might want to allocate the memory using mmap which maps a file into virtual memory making it accessable as if it was in physical memory (but it will only partially be cached in memory). I'm not sure if it's guaranteed but mmap usally aligns on optimal page size boundary (almost always 4096 byte).
Anyway you will have a huge performance loss due to the fact that your disk is way slower than your RAM. This is so serious that using AVX will probably not speed up anything at all.
On the performance loss of using unaligned memory:
On modern hardware (say Intel's Haswell onwards I think) this depends on your access patterns. Unaligned access should have almost no performance overhead on iterating over the array in memory order (each cache line will still be loaded only once). If you access it in random order than you will often cross the 64 byte cache line boundry. This means your processor will have to load 2 lines into cache and remove 2 lines from the cache instead of only one. While this might be a serious problem for some situations in your case the disk will slows things down so much that you will barely notice this.
Addtional tips (or a shot in the dark):
The way you gave the size of the array (2000*1300*17*17*4) suggests that you are using a multidimensional array (e.g. auto x = new __m256[2000][1300][17][17][4]). So some tipps on that:
Iterate through it mostly sequential
Check if it is sparse (meaning some of the memory will never be accessed) and shrink it if possible.
You could try to flatten the array and do more complex index calculation yourself in order to reduce the amount of memory need. If you get it to fit completely into your RAM you can start to optimise your code (using AVX and/or aligned memory).
"Total paging file size for all drives is 15247MB" suggests that you actually using only parts of that 96 GB so there might be a way to further reduce your usage.
In that case you might also want to ask another question on how to reduce the memory usage with more info on what you are doing.

Why can't we declare an array, say of int data type, of any size within the memory limit?

int A[10000000]; //This gives a segmentation fault
int *A = (int*)malloc(10000000*sizeof(int));//goes without any set fault.
Now my question is, just out of curiosity, that if ultimately we are able to allocate higher space for our data structures, say for example, BSTs and linked lists created using the pointers approach in C have no as such memory limit(unless the total size exceeds the size of RAM for our machine) and for example, in the second statement above of declaring a pointer type, why is that we can't have an array declared of higher size(until it reaches the memory limit!!)...Is this because the space allocated is contiguous in a static sized array?.But then from where do we get the guarantee that in the next 1000000 words in RAM no other piece of code would be running...??
PS: I may be wrong in some of the statements i made..please correct in that case.
Firstly, in a typical modern OS with virtual memory (Linux, Windows etc.) the amount of RAM makes no difference whatsoever. Your program is working with virtual memory, not with RAM. RAM is just a cache for virtual memory access. The absolute limiting factor for maximum array size is not RAM, it is the size of the available address space. Address space is the resource you have to worry about in OSes with virtual memory. In 32-bit OSes you have 4 gigabytes of address space, part of which is taken up for various household needs and the rest is available to you. In 64-bit OSes you theoretically have 16 exabytes of address space (less than that in practical implementations, since CPUs usually use less than 64 bits to represent the address), which can be perceived as practically unlimited.
Secondly, the amount of available address space in a typical C/C++ implementation depends on the memory type. There's static memory, there's automatic memory, there's dynamic memory. The address space limits for each memory type are pre-set in advance by the compiler. Which raises the question: where are you declaring your large array? Which memory type? Automatic? Static? You provided no information, but this is absolutely necessary. If you are attempting to declare it as a local variable (automatic memory), then no wonder it doesn't work, since automatic memory (aka "stack memory") has very limited address space assigned to it. Your array simply does not fit. Meanwhile, malloc allocates dynamic memory, which normally has the largest amount of address space available.
Thirdly, many compilers provide you with options that control the initial distribution of address space between different kinds of memory. You can request a much larger stack size for your program by manipulating such options. Quite possibly you can request a stack so large, than your local array will fit in it without any problems. But in practice, for obvious reasons, it makes very little sense to declare huge arrays as local variables.
Assuming local variables, this is because on modern implementations automatic variables will be allocated on the stack which is very limited in space. This link gives some of the common stack sizes:
platform default size
=====================================
SunOS/Solaris 8172K bytes
Linux 8172K bytes
Windows 1024K bytes
cygwin 2048K bytes
The linked article also notes that the stack size can be changed for example in Linux, one possible way from the shell before running your process would be:
ulimit -s 32768 # sets the stack size to 32M bytes
While malloc on modern implementations will come from the heap, which is only limited to the memory you have available to the process and in many cases you can even allocate more than is available due to overcommit.
I THINK you're missing the difference between total memory, and your programs memory space. Your program runs in an environment created by your operating system. It grants it a specific memory range to the program, and the program has to try to deal with that.
The catch: Your compiler can't 100% know the size of this range.
That means your compiler will successfully build, and it will REQUEST that much room in memory when the time comes to make the call to malloc (or move the stack pointer when the function is called). When the function is called (creating a stack frame) you'll get a segmentation fault, caused by the stack overflow. When the malloc is called, you won't get a segfault unless you try USING the memory. (If you look at the manpage for malloc() you'll see it returns NULL when there's not enough memory.)
To explain the two failures, your program is granted two memory spaces. The stack, and the heap. Memory allocated using malloc() is done using a system call, and is created on the heap of your program. This dynamically accepts or rejects the request and returns either the start address, or NULL, depending on a success or fail. The stack is used when you call a new function. Room for all the local variables is made on the stack, this is done by program instructions. Calling a function can't just FAIL, as that would break program flow completely. That causes the system to say "You're now overstepping" and segfault, stopping the execution.

Will new automatically use swap if memory is limited?

If I try to allocate memory:
int ramSize = magicLibrary.getRamSize();
assert(ramSize*2 <= pow(2,64-8));//Don't overflow the 64 bit limit (32 on some systems).
int * mainData new int[ramSize*2/sizeof(int)]; //2 times too big.
Will I get disk swap to fill in the space? If not, how can I use swap?
As long as neither RAM nor Swap-space is completely exhausted, and address space limitations (that is, 2-3GB in 32-bit, more than your ram in a 64-bit system), the system will allow you to allocate memory. If RAM is exhausted, the system will use swap-space.
Sometimes, the OS will also allow "overcommit", which is basically the same as airlines do when they "expect some passengers to not show up", so they sell a few extra tickets for that flight, and worry about "all seats full" when they get to that point. In terms of computers, that means that the OS may well allow you to allocate more memory than there really is available, and the error of "there isn't enough" is solved later by some means (typically by freeing up some memory that is "spare" or "killing some random application"). The reason the OS allows this is that applications quite often allocate large areas of memory that isn't fully used. Memory areas that are filled with zero may also be "merged" as "copy-on-write" memory, such that if you later write to it, it makes a new copy of that memory. This also may mean that if you allocate a large amount of memory, you have to actually write something (other than zero?) to it to make it "in use". Again, this reflects applications typical behaviour of "allocate a large amount of memory, fill it with zero, and only use some of it later" - so the OS tries to "save space" by not having a huge amount of "pages with just zeros in them".
Note that whether the memory is in RAM or Swap is a dynamic criteria - memory is swapped in and out all the time, and both program code and data may be swapped out at any time, and then be brought back when it's needed. It doesn't matter if this is "heap" or some other memory, all memory is pretty much equal in this respect.
It isn't entirely clear what you actually want to achieve, but hopefully this explains at least what happens.
Oh, and assuming ramSize is number of bytes, this is almost certainly always false:
assert(ramSize*2) <= 2^(64-8)
since ramSize * 2 > 0, and (2 XOR 56) = 0.
First of all, new knows nothing about swap. That's the job of the operating system (at least for general purpose operating systems, like the ones you are working with) and it will decide when to swap and what to swap out.
With that said, your attempt to allocate that much memory will almost certainly fail. Even if sufficient swap memory is available, the operating system reserves large chunks of virtual address space for itself which means what you can allocate is limited. For example, on 32-bit Windows 2GB of addresses are reserved for the kernel, and you coudln't allocate 4GB of virtual memory if you tried, because there aren't enough addresses available to represent that much memory.
What are you, actually, trying to do?
In short Swap space is the portion of virtual memory that is on the hard disk, used when RAM is full. And this is the job of OS to take care of not the new operator!
If you use new, the operating system receives a request for memory.
It then decides where to get that memory.
It is the same as with opening a file or echoing with cout. Those things are builtin to c++.
So your programms might already use swap, but you did not know about it.
If you want to know more about this subject, you might want to look up the inner workings of unix (mother and father of every operating system currently in use).

why does dynamic memory allocation fail after 600MB?

i implemented a bloom filter(bit table) using three dimension char array it works well until it reaches at a point where it can no more allocate memory and gives a bad_alloc message. It gives me this error on the next expand request after allocating 600MB.
The bloom filter(the array) is expected to grow as big as 8 to 10GB.
Here is the code i used to allocate(expand) the bit table.
unsigned char ***bit_table_=0;
unsigned int ROWS_old=5;
unsigned int EXPND_SIZE=5;
void expand_bit_table()
{
FILE *temp;
temp=fopen("chunk_temp","w+b");
//copy old content
for(int i=0;i<ROWS_old;++i)
for(int j=0;j<ROWS;++j)
fwrite(bit_table_[i][j],COLUMNS,1,temp);
fclose(temp);
//delete old table
chunk_delete_bit_table();
//create expanded bit table ==> add EXP_SIZE more rows
bit_table_=new unsigned char**[ROWS_old+EXPND_SIZE];
for(int i=0;i<ROWS_old+EXPND_SIZE;++i)
{
bit_table_[i]=new unsigned char*[ROWS];
for(int k=0;k<ROWS;++k)
bit_table_[i][k]=new unsigned char[COLUMNS];
}
//copy back old content
temp=fopen("chunk_temp","r+b");
for(int i=0;i<ROWS_old;++i)
{
fread(bit_table_[i],COLUMNS*ROWS,1,temp);
}
fclose(temp);
//set remaining content of bit_table_to 0
for(int i=ROWS_old;i<ROWS_old+EXPND_SIZE;++i)
for(int j=0;j<ROWS;++j)
for(int k=0;k<COLUMNS;++k)
bit_table_[i][j][k]=0;
ROWS_old+=EXPND_SIZE;
}
What is the maximum allowable size for an array and if this is not the issue what can i do about it.
EDIT:
It is developed using a 32 bit platform.
It is run on 64 bit platform(server) with 8GB RAM.
A 32-bit program must allocate memory from the virtual memory address space. Which stores chunks of code and data, memory is allocated from the holes between them. Yes, the maximum you can hope for is around 650 megabytes, the largest available hole. That goes rapidly down from there. You can solve it by making your data structure smarter, like a tree or list instead of one giant array.
You can get more insight in the virtual memory map of your process with the SysInternals' VMMap utility. You might be able to change the base address of a DLL so it doesn't sit plumb in the middle of an otherwise empty region of the address space. Odds that you'll get much beyond 650 MB are however poor.
There's a lot more breathing room on a 64-bit operating system, a 32-bit process has a 4 gigabyte address space since the operating system components run in 64-bit mode. You have to use the /LARGEADDRESSAWARE linker option to allow the process to use it all. Still, that only works on a 64-bit OS, your program is still likely to bomb on a 32-bit OS. When you really need that much VM, the simplest approach is to just make a 64-bit OS a prerequisite and build your program targeting x64.
A 32-bit machine gives you a 4GB address space.
The OS reserves some of this (half of it by default on Windows, giving you 2GB to yourself. I'm not sure about Linux, but I believe it reserves 1GB)
This means you have 2-3 GB to your own process.
Into this space, several things need to fit:
your executable (as well as all dynamically linked libraries) are memory-mapped into it
each thread needs a stack
the heap
and quite a few other nitty gritty bits.
The point is that it doesn't really matter how much memory you end up actually using. But a lot of different pieces have to fit into this memory space. And since they're not packed tightly into one end of it, they fragment the memory space. Imagine, for simplicity, that your executable is mapped into the middle of this memory space. That splits your 3GB into two 1.5GB chunks. Now say you load two dynamic libraries, and they subdivide those two chunks into four 750MB ones. Then you have a couple of threads, each needing further chunks of memory, splitting up the remaining areas further. Of course, in reality each of these won't be placed at the exact center of each contiguous block (that'd be a pretty stupid allocation strategy), but nevertheless, all these chunks of memory subdivide the available memory space, cutting it up into many smaller pieces.
You might have 600MB memory free, but you very likely won't have 600MB of contiguous memory available. So where a single 600MB allocation would almost certainly fail, six 100MB allocations may succeed.
There's no fixed limit on how big a chunk of memory you can allocate. The answer is "it depends". It depends on the precise layout of your process' memory space. But on a 32-bit machine, you're unlikely to be able to allocate 500MB or more in a single allocation.
The maximum in-memory data a 32-bit process can access is 4GB in theory (in practice it will be somewhat smaller). So you cannot have 10GB data in memory at once (even with the OS supporting more). Also, even though you are allocating the memory dynamically, the free store available is further limited by the stack size.
The actual memory available to the process depends on the compiler settings that generates the executable.
If you really do need that much, consider persisting (parts of) the data in the file system.

Memory calculation of objects inaccurate?

I'm creating a small cache daemon, and I want to limit its memory usage to approximately a specified amount. However, there seems to be an issue just trying to calculate how much memory is in use.
Every time a CacheEntry object is created, it adds the size of a CacheEntry object (apparently 64 bytes) plus the number of bytes used in internal arrays to the counter for how many bytes are in use. When the CacheEntry object is deleted, it subtracts that amount. I can confirm that the math, at least, is correct.
However, when run inside NetBeans, the memory profiler reports vastly different numbers. Roughly twice as high, to be specific. It is not a memory leak, and it is specifically related to the amount of CacheEntry objects currently in existence. Increasing the amount of data stored in the internal arrays actually brings the numbers closer together (as opposed to further apart, if that were being improperly calculated); from this, I have concluded that the overhead of having a CacheEntry object in memory is almost twice what sizeof() is reporting. It does not rise in steps or "chunks".
Is there some common reason why this might happen?
UPDATE: Just to check, I ran my tests without a profiler in place. Linux reports the same VmHWM/VmRSS either way, so the memory profiler is definitely not affecting the calculations.
Perhaps the profiler is adding reference objects to track the objects? Do you see the same results when you run the application in release vs Debug?
Is there some common reason why this might happen?
Yeah, that could be internal fragmentation and overhead of the memory manager. If your data type is small (eg. sizeof(CacheEntry) is 8 bytes), newing such data type might produce a bigger chunk of memory. It is partly used for malloc's internal bookkeeping (it usually stores the size of the block somewhere), partly for padding needed to align your data type on its natural boundary (eg. 8 bytes data + 4 bytes bookkeeping + 4 bytes padding needed to align the whole thing on 8-byte boundary).
You can solve it by allocating from a single continuous array of CacheEntry (eg. CacheEntry array[1000] takes exactly 1000*sizeof(CacheEntry) bytes). You'd have to track the usage of the individual elements in the array, but that should be doable without additional memory. (eg. by running a free-list of entries in the place of the free entries).
This memory bloat is caused by use of new, specifically on relatively small objects. On Windows, dynamically allocated memory incurs a 16- or 24-byte overhead each time; I haven't found the exact numbers for Linux, but it's roughly the same. This is because each allocated chunk needs to record its location and size (possibly more than once) so that it can be accurately freed later.
As far as I'm aware, the running program also does not know exactly how much overhead is involved in this, at least in any way accessible to the programmer.
Generally speaking, large quantities of small objects should use a memory pool, both for speed and memory conservation.