Allocate Array of bitset<N> on heap - c++

I'm creating an array of bitsets on the stack using this code:
int Rows = 800000;
int Columns = 2048;
bitset<Columns> data[Rows];
If I don't raise the stack size to hundreds of Megabytes, I get an stack overflow error.
Is there any way to allocate this code on the heap? for example with a code like this (I'm not even sure if this code is right):
bitset<Columns>* data[Rows] = new bitset<Columns>();
Edit: And more importantly, does this help memory usage or speed? Does it make any difference whether I use the Stack or the Heap for this? I really don't want to use any other libraries such as Boost too...
I come from a Java background and some C++ syntax is new for me, Sorry if the question seems kind of wrong.

#include<bitset>
#include<vector>
constexpr int Rows = 800000;
constexpr int Columns = 2048;
int your_function() {
std::vector<std::bitset<Columns> > data (Rows);
// do something with data
}
This will allocate the memory on the heap and it will still take whatever amount of memory it took before (plus a few bytes for bookkeeping). The heap however is not limited by a fixed size like the stack, but mainly limited by how much memory the system has, so on a reasonably modern PC you should be fine with a few hundred megabytes.
I am not sure if that was your concern, but bitset's memory usage is not inefficient- sizeof(std::bitset<2048>) == 256 on gcc so you do not waste a single bit there.

Storing this in the inverse way:
int Rows = 2048;
int Columns = 800000;
bitset<Columns> data[Rows];
Will save you almost 18Mb, and it is better for data locality.
In the first method If you calculate how much memory you are using:
A = (24 * 800000) + 800000 * (2048/8) = 224x10^6 bytes
In the other hand, it you swap the row size with the column size the memory you will be using is:
B = (24 * 2048) + 2048 * (800000/8) = 204,849x10^6 bytes
where 24 is the fixed size in bytes of a vector object in C++ in most systems.
So in the second case you are reducing the memory usage by using less vectors.
A - B = 19150448 bytes = 18,26 Mb
This is not the answer but it can inderctly help you solve your problem.

Related

How does memory on the heap get exhausted?

I have been testing out some of my own code to see how much allocated memory it takes to exhaust the memory on the heap or free store. However, unless my code is wrong in the testing of it, I am getting completely different results in terms of how much memory can be put on the heap.
I am testing two different programs. The first program creates vector objects on the heap. The second program creates integer objects on the heap.
Here is my code:
#include <vector>
#include <stdio.h>
int main()
{
long long unsigned bytes = 0;
unsigned megabytes = 0;
for (long long unsigned i = 0; ; i++) {
std::vector<int>* pt1 = new std::vector<int>(100000,10);
bytes += sizeof(*pt1);
bytes += pt1->size() * sizeof(pt1->at(0));
megabytes = bytes / 1000000;
if (i >= 1000 && i % 1000 == 0) {
printf("There are %d megabytes on the heap\n", megabytes);
}
}
}
The final output of this code before getting a bad_alloc error is: "There are 2000 megabytes on the heap"
In the second program:
#include <stdio.h>
int main()
{
long long unsigned bytes = 0;
unsigned megabytes = 0;
for (long long unsigned i = 0; ; i++) {
int* pt1 = new int(10);
bytes += sizeof(*pt1);
megabytes = bytes / 1000000;
if (i >= 100000 && i % 100000 == 0) {
printf("There are %d megabytes on the heap\n", megabytes);
}
}
}
The final output of this code before getting a bad_alloc error is: "There are 511 megabytes on the heap"
The final output in both programs is vastly different. Am I misunderstanding something about the free store? I thought that both results would be about the same.
It is very likely that pointers returned by new on your platform are 16-byte aligned.
If int is 4 bytes, this means that for every new int(10) you're getting four bytes and making 12 bytes unusable.
This alone would explain the difference between getting 500MB of usable space from small allocations and 2000MB from large ones.
On top of that, there's overhead of keeping track of allocated blocks (at a minimum, of their size and whether they're free or in use). That is very much specific to your system's memory allocator but also incurs per-allocation overhead. See "What is a Chunk" in https://sourceware.org/glibc/wiki/MallocInternals for an explanation of glibc's allocator.
First of all you have to understand that operating system assign memory to process in quite large chunks of memory called pages (it is a hardware property). Page size is about 4 -16 kB.
Now standard library try use memory in efficient way. So it have to find a way to chop pages to smaller pieces and manage them. To do that some extra information about heap structure have to be maintained.
Here is cool Andrei Alexandrescu cppcon talk more or less how it works (it omits information about pages management).
So when you allocating lots of small objects information about heap structure is quite large. On other hand if you allocating smaller number of larger objects is more efficient - less memory is waisted on tracking memory structure.
Note also that depending on heap strategy sometimes (when small piece of memory is requested) it is more efficient to waste some memory and return larger size of memory then it was requested.

c++ memory leak with std::vectors

I'm reading a 400mb file into a c++ vector with the following code:
#define RAMALLOC 20000000
struct worddata {
std::string name;
double ussage;
};
// ...
int counter = 0;
std::string dName;
double dUssage;
std::vector<worddata> primDataBank;
primDataBank.resize(RAMALLOC);
std::ifstream fIn(PATH + "output.dat");
while (fIn >> dName >> dUssage) {
primDataBank[counter].name = dName;
primDataBank[counter].ussage = dUssage;
counter++;
}
I have resided the vector to a size of 20,000,000 items, so as I assign to it in the loop, the ram usage shouldn't be increasing. However when I run it, the ram usage increases rapidly.
In the Visual Studio debugger heap snapshot, it shows me that the ram is being occupied by processFrequencyData.exe!std::_Container_proxy. The "allocation call stack" looks like so:
This appears to have its roots in the vector.
How can I stop my ram usage from increasing?
Thanks.
Update:
My ram usage still increases rapidly when I comment out the lines of code in the while loop that assign values
while (fIn >> dName >> dUssage) {
//primDataBank[counter].name = dName;
//primDataBank[counter].ussage = dUssage;
counter++;
}
However ram usage doesn't increase when I also comment out the vector code:
//std::vector<worddata> primDataBank;
//primDataBank.resize(RAMALLOC);
The vector your creating uses approximately
20000000 * 32 bytes = 640 000 000 ie 640 MB // who said 640K would be enough?
The size of worddata comes from std::string is around 24 bytes + 8 for the double.
Then you start reading strings, if they are sufficiently small the string will maybe use small-string-optimization, that is using the internal data and capacity for storing the chars.
But if they are larger than ~12(???) chars the string allocates an extra array to keep the chars.
The updates requires more investigation.
Your memory use increases because you are creating and storing all those strings you read from the file.
A string is not a fixed size object so the only way you can pre-allocate space for strings is to use a custom allocator.
You should prefer using reserve and emplace_back rather than resize and setting fields as this will avoid allocating 0 length strings you don't need.
I find your update hard to believe.

Read from 5x10^8 different array elements, 4 bytes each time

So I'm taking an assembly course and have been tasked with making a benchmark program for my computer - needless to say, I'm a bit stuck on this particular piece.
As the title says, we're supposed to create a function to read from 5x108 different array elements, 4 bytes each time. My only problem is, I don't even think it's possible for me to create an array of 500 million elements? So what exactly should I be doing? (For the record, I'm trying to code this in C++)
//Benchmark Program in C++
#include <iostream>
#include <time.h>
using namespace std;
int main() {
clock_t t1,t2;
int readTemp;
int* arr = new int[5*100000000];
t1=clock();
cout << "Memory Test"
<< endl;
for(long long int j=0; j <= 500000000; j+=1)
{
readTemp = arr[j];
}
t2=clock();
float diff ((float)t2-(float)t1);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Time Taken: " << seconds << " seconds" <<endl;
}
Your system tries to allocate 2 billion bytes (1907 MiB), while the maximum available memory for Windows is 2 gigabytes (2048 MiB). These numbers are very close. It's likely your system has allocated the remaining 141 MiB for other stuff. Even though your code is very small, OS is pretty liberal in allocation of the 2048 MiB address space, wasting large chunks for e.g. the following:
C++ runtime (standard library and other libraries)
Stack: OS allocates a lot of memory to support recursive functions; it doesn't matter that you don't have any
Paddings between virtual memory pages
Paddings used just to make specific sections of data appear at specific addresses (e.g. 0x00400000 for lowest code address, or something like that, is used in Windows)
Padding used to randomize the values of pointers
There's a Windows application that shows a memory map of a running process. You can use it by adding a delay (e.g. getchar()) before the allocation and looking at the largest contiguous free block of memory at that point, and which allocations prevent it from being large enough.
The size is possible :
5 * 10^8 * 4 = ~1.9 GB.
First you will need to allocate your array (dynamically only ! There's no such stack memory).
For your task the 4 bytes is the size of an interger, so you can do it
int* arr = new int[5*100000000];
Alternatively, if you want to be more precise, you can allocate it as bytes
int* arr = new char[5*4*100000000];
Next, you need to make the memory dirty (meaning write something into it) :
memset(arr,0,5*100000000*sizeof(int));
Now, you can benchmark cache misses (I'm guessing that's what it's intended in such a huge array) :
int randomIndex= GetRandomNumberBetween(0,5*100000000-1); // make your own random implementation
int bytes = arr[randomIndex]; // access 4 bytes through integer
If you want 5* 10 ^8 accesses randomly you can make a knuth shuffle inside your getRandomNumber instead of using pure random.

C++ Program Crashes Due To Large Array Even Though It's On The Heap

I'm allocating memory for three very large arrays (N = 990000001). I know you have to allocate this on the heap because it's so large, but even when I do that, the program keeps crashing. Am I allocating it incorrectly or is my computer simply not have enough memory (I should have plenty)? The other thing that may be the problem is that I'm somehow allocating my memory incorrectly. The way I'm allocating memory right now works perfectly fine when N is small. Any help is appreciated.
int main()
{
double *Ue = new double[N];
double *U = new double[N];
double *X = new double[N];
for (int i = 0; i < N; i++)
{
X[i] = X0 + dx*i;
Ue[i] = U0/pow((X0*X[i]),alpha);
}
//Declare Variables
double K1;double K2; double K3; double K4;
//Set Initial Condition
U[0] = U0;
for (int i = 0; i < N-1; i++)
{
K1 = deriv(U[i],X[i]);
K2 = deriv(U[i]+0.5*dx*K1,X[i]+0.5*dx);
K3 = deriv(U[i]+0.5*dx*K2,X[i]+0.5*dx);
K4 = deriv(U[i]+dx*K3,X[i+1]);
U[i+1] = U[i] + dx/6*(K1 + 2*K2 + 2*K3 + K4);
}
return 0;
}
Your program allocates and uses about 24 GB of memory.
If you are the program as a 32-bit process, this will throw std::bad_alloc, and your program will exit gracefully. (Theoretically there could be an overflow bug in your toolchain, but I think this is unlikely.)
If you are the program as a 64-bit process, you might get snagged by the OOM killer and your program will exit ungracefully. Unless you have 24 GB in combined RAM + swap, then you might churn through at the speed of your disk. (If you actually have 24 GB of RAM, then it probably wouldn't crash, so we can rule this out.) If overcommit is disabled then you will get std::bad_alloc instead of the OOM killer. (This paragraph is kind of Linux-specific, though other kernels are similar.)
Solution: Use less memory or buy more RAM.
If on Windows, you may find useful this information Memory Limits for Applications on Windows -
Note that the limit on static and stack data is the same in both
32-bit and 64-bit variants. This is due to the format of the Windows
Portable Executable (PE) file type, which is used to describe EXEs and
DLLs as laid out by the linker. It has 32-bit fields for image section
offsets and lengths and was not extended for 64-bit variants of
Windows. As on 32-bit Windows, static data and stack share the same
first 2GB of address space.
Then, the only real improvements -
Dynamic data - this is memory that is allocated during program
execution. In or C or C++ this is usually done with malloc or new.
64-bit
Static data 2Gb
Dynamic data 8Tb
Stack data 1GB
(the stack size is set by the linker, the default is 1MB. This can be
increased using the Linker property System > Stack Reserve Size)
Allocation of single array "should be able to allocate as large as the OS is willing to handle" (i.e. limited by RAM and fragmentation).

High number causes seg fault

This bit of code is from a program I am writing to take in x col and x rows to run a matrix multiplication on CUDA, parallel processing. The larger the sample size, the better.
I have a function that auto generates x amount of random numbers.
I know the answer is simple but I just wanted to know exactly why. But when I run it with say 625000000 elements in the array, it seg faults. I think it is because I have gone over the size allowed in memory for an int.
What data type should I use in place of int for a larger number?
This is how the data is being allocated, then passed into the function.
a.elements = (float*) malloc(mem_size_A);
where
int mem_size_A = sizeof(float) * size_A; //for the example let size_A be 625,000,000
Passed:
randomInit(a.elements, a.rowSize,a.colSize, oRowA, oColA);
What the randomInit is doing is say I enter a 2x2 but I am padding it up to a multiple of 16. So it takes the 2x2 and pads the matrix to a 16x16 of zeros and the 2x2 is still there.
void randomInit(float* data, int newRowSize,int newColSize, int oldRowSize, int oldColSize)
{
printf("Initializing random function. The new sized row is %d\n", newRowSize);
for (int i = 0; i < newRowSize; i++)//go per row of new sized row.
{
for(int j=0;j<newColSize;j++)
{
printf("This loop\n");
if(i<oldRowSize&&j<oldColSize)
{
data[newRowSize*i+j]=rand() / (float)RAND_MAX;//brandom();
}
else
data[newRowSize*i+j]=0;
}
}
}
I've even ran it with the printf in the loop. This is the result I get:
Creating the random numbers now
Initializing random function. The new sized row is 25000
This loop
Segmentation fault
Your memory allocation for data is probably failing.
Fortunately, you almost certainly don't need to store a large collection of random numbers.
Instead of storing:
data[n]=rand() / (float)RAND_MAX
for some huge collection of n, you can run:
srand(n);
value = rand() / (float)RAND_MAX;
when you need a particular number and you'll get the same value every time, as if they were all calculated in advance.
I think you're going past the value you allocated for data. when you're newrowsize is too large, you're accessing unallocated memory.
remember, data isn't infinitely big.
Well the real problem is that, if the problem is really the integer size used for your array access, you will be not able to fix it. I think you probably just have not enough space in your memory so as to store that huge number of data.
If you want to extends that, just define a custom structure or class if you are in C++. But you will loose the O(1) time access complexity involves with array.