c++ memory leak with std::vectors - c++

I'm reading a 400mb file into a c++ vector with the following code:
#define RAMALLOC 20000000
struct worddata {
std::string name;
double ussage;
};
// ...
int counter = 0;
std::string dName;
double dUssage;
std::vector<worddata> primDataBank;
primDataBank.resize(RAMALLOC);
std::ifstream fIn(PATH + "output.dat");
while (fIn >> dName >> dUssage) {
primDataBank[counter].name = dName;
primDataBank[counter].ussage = dUssage;
counter++;
}
I have resided the vector to a size of 20,000,000 items, so as I assign to it in the loop, the ram usage shouldn't be increasing. However when I run it, the ram usage increases rapidly.
In the Visual Studio debugger heap snapshot, it shows me that the ram is being occupied by processFrequencyData.exe!std::_Container_proxy. The "allocation call stack" looks like so:
This appears to have its roots in the vector.
How can I stop my ram usage from increasing?
Thanks.
Update:
My ram usage still increases rapidly when I comment out the lines of code in the while loop that assign values
while (fIn >> dName >> dUssage) {
//primDataBank[counter].name = dName;
//primDataBank[counter].ussage = dUssage;
counter++;
}
However ram usage doesn't increase when I also comment out the vector code:
//std::vector<worddata> primDataBank;
//primDataBank.resize(RAMALLOC);

The vector your creating uses approximately
20000000 * 32 bytes = 640 000 000 ie 640 MB // who said 640K would be enough?
The size of worddata comes from std::string is around 24 bytes + 8 for the double.
Then you start reading strings, if they are sufficiently small the string will maybe use small-string-optimization, that is using the internal data and capacity for storing the chars.
But if they are larger than ~12(???) chars the string allocates an extra array to keep the chars.
The updates requires more investigation.

Your memory use increases because you are creating and storing all those strings you read from the file.
A string is not a fixed size object so the only way you can pre-allocate space for strings is to use a custom allocator.
You should prefer using reserve and emplace_back rather than resize and setting fields as this will avoid allocating 0 length strings you don't need.
I find your update hard to believe.

Related

Memory usage of large 2D static array and vector of vectors

I need to use large matrix 20000 * 20000 for a machine learning project. When it is implemented as static array, it uses approximately 1.57 GB of memory. If it is implemented with vector of vectors it uses much more memory then the static array (approximately 3.06 GB). I could not figure out the reason behind it.
Array version:
static double distanceMatrix[20000][20000] = {0};
Vector of vectors:
vector<vector<double>> distanceMatrix(20000, vector<double> (20000));
I used them to store the distances between points.
for (int i = 0; i < 20000; i++){
for (int j = i+1; j < 20000; j++)
distanceMatrix[i][j] = euclid_dist(pointVec[i], pointVec[j]);
}
I also observed that when I used array version memory usage increases gradually during the nested loop. However, while using vector of vectors, memory usage reaches 3.06 GB then nested loop starts.
I checked the memory usage with Xcode debug navigator and Activity Monitor. Thanks in advance!
That's because of vector's memory allocation strategy, which is probably newsize=oldsize*const when reaching its limit (implementation-dependant), see also vector memory allocation strategy
First of all the array doesn't take 1,57 GB of memory. So there is an issue with the measurement.
Experiment with the static array
When running the following code in Xcode, you'll find out that the array is exactly 3,2 Gb in size:
const size_t matsize=20000;
static double mat2D[matsize][matsize] = {0};
cout<<"Double: " << sizeof (double) <<endl;
cout<<"Array: " << sizeof mat2D <<endl;
cout<<" " << sizeof(double)*matsize*matsize<<endl;
// ... then your loop
When the programme starts, its reported memory consumption is only 5,3MB before entering into the loop, although the static array is already there. Once the loop finished, the reported memory consumption is 1,57 Gb as you explained. But still not the 3,2Gb that we could expect.
The memory consumption figure that you read is the physical memory used by your process. The remaining memory of the process is in the virtual memory, which is much larger (7 Gb during my experiment).
Experiment of the vector
First, let's look at the approximate memory size of the vector, knowing that each vector has a fixed size plus a dynamically allocated variable size (based on the capacity which can be equal or greater than the number of elements actually stored in the vector). The following code can give you some estimates:
vector<vector<double>> mat2D(matsize, vector<double> (matsize));
cout<<"Vector (fixed part):" << sizeof (vector<double>)<<endl;
cout<<"Vector capacity: " << mat2D[0].capacity()<<endl;
cout<<" (variable): " << sizeof(double)*mat2D[0].capacity()<<endl;
cout<<" (total): " << sizeof(double)*mat2D[0].capacity() + sizeof(mat2D[0])<<endl;
cout<<"Vector of vector: " << (sizeof(double)*mat2D[0].capacity() + sizeof(mat2D[0]))*matsize+sizeof(mat2D)<<endl;
// then your loop
Running this code will show that the memory required to store your vector of vector is about 3,2 Gb + 480 Kb overhead (24 bytes per vector * 2001 vectors).
Before entering the loop, we will notice that already 3 Gb of physical memory is used. So MacOS uses twice the physical memory for this dynamic version compared to the array version. This is certainly because the allocation process requires to really access more memory upfront: each of the 2000 lines requires a separate allocation and initialization.
Conclusion
The overall virtual memory used in the two approaches is not that different. I could measure around 7Gb running it in debug mode with Xcode. The vector variant uses a little bit more than previously due to the extra 480Kb overhead for vectors.
The strategy used by macOS for using more or less physical memory is complex and may depend on many factors (including the access patterns), the goal being to find the best tradeoff between physical memory available and cost of swapping.
But the physical memory usage is not representative of the memory consumption by the process.

Allocate Array of bitset<N> on heap

I'm creating an array of bitsets on the stack using this code:
int Rows = 800000;
int Columns = 2048;
bitset<Columns> data[Rows];
If I don't raise the stack size to hundreds of Megabytes, I get an stack overflow error.
Is there any way to allocate this code on the heap? for example with a code like this (I'm not even sure if this code is right):
bitset<Columns>* data[Rows] = new bitset<Columns>();
Edit: And more importantly, does this help memory usage or speed? Does it make any difference whether I use the Stack or the Heap for this? I really don't want to use any other libraries such as Boost too...
I come from a Java background and some C++ syntax is new for me, Sorry if the question seems kind of wrong.
#include<bitset>
#include<vector>
constexpr int Rows = 800000;
constexpr int Columns = 2048;
int your_function() {
std::vector<std::bitset<Columns> > data (Rows);
// do something with data
}
This will allocate the memory on the heap and it will still take whatever amount of memory it took before (plus a few bytes for bookkeeping). The heap however is not limited by a fixed size like the stack, but mainly limited by how much memory the system has, so on a reasonably modern PC you should be fine with a few hundred megabytes.
I am not sure if that was your concern, but bitset's memory usage is not inefficient- sizeof(std::bitset<2048>) == 256 on gcc so you do not waste a single bit there.
Storing this in the inverse way:
int Rows = 2048;
int Columns = 800000;
bitset<Columns> data[Rows];
Will save you almost 18Mb, and it is better for data locality.
In the first method If you calculate how much memory you are using:
A = (24 * 800000) + 800000 * (2048/8) = 224x10^6 bytes
In the other hand, it you swap the row size with the column size the memory you will be using is:
B = (24 * 2048) + 2048 * (800000/8) = 204,849x10^6 bytes
where 24 is the fixed size in bytes of a vector object in C++ in most systems.
So in the second case you are reducing the memory usage by using less vectors.
A - B = 19150448 bytes = 18,26 Mb
This is not the answer but it can inderctly help you solve your problem.

Read from 5x10^8 different array elements, 4 bytes each time

So I'm taking an assembly course and have been tasked with making a benchmark program for my computer - needless to say, I'm a bit stuck on this particular piece.
As the title says, we're supposed to create a function to read from 5x108 different array elements, 4 bytes each time. My only problem is, I don't even think it's possible for me to create an array of 500 million elements? So what exactly should I be doing? (For the record, I'm trying to code this in C++)
//Benchmark Program in C++
#include <iostream>
#include <time.h>
using namespace std;
int main() {
clock_t t1,t2;
int readTemp;
int* arr = new int[5*100000000];
t1=clock();
cout << "Memory Test"
<< endl;
for(long long int j=0; j <= 500000000; j+=1)
{
readTemp = arr[j];
}
t2=clock();
float diff ((float)t2-(float)t1);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Time Taken: " << seconds << " seconds" <<endl;
}
Your system tries to allocate 2 billion bytes (1907 MiB), while the maximum available memory for Windows is 2 gigabytes (2048 MiB). These numbers are very close. It's likely your system has allocated the remaining 141 MiB for other stuff. Even though your code is very small, OS is pretty liberal in allocation of the 2048 MiB address space, wasting large chunks for e.g. the following:
C++ runtime (standard library and other libraries)
Stack: OS allocates a lot of memory to support recursive functions; it doesn't matter that you don't have any
Paddings between virtual memory pages
Paddings used just to make specific sections of data appear at specific addresses (e.g. 0x00400000 for lowest code address, or something like that, is used in Windows)
Padding used to randomize the values of pointers
There's a Windows application that shows a memory map of a running process. You can use it by adding a delay (e.g. getchar()) before the allocation and looking at the largest contiguous free block of memory at that point, and which allocations prevent it from being large enough.
The size is possible :
5 * 10^8 * 4 = ~1.9 GB.
First you will need to allocate your array (dynamically only ! There's no such stack memory).
For your task the 4 bytes is the size of an interger, so you can do it
int* arr = new int[5*100000000];
Alternatively, if you want to be more precise, you can allocate it as bytes
int* arr = new char[5*4*100000000];
Next, you need to make the memory dirty (meaning write something into it) :
memset(arr,0,5*100000000*sizeof(int));
Now, you can benchmark cache misses (I'm guessing that's what it's intended in such a huge array) :
int randomIndex= GetRandomNumberBetween(0,5*100000000-1); // make your own random implementation
int bytes = arr[randomIndex]; // access 4 bytes through integer
If you want 5* 10 ^8 accesses randomly you can make a knuth shuffle inside your getRandomNumber instead of using pure random.

Wrangling memory for a highly iterative c++ program

tl:dr I am needing a way to better manage memory in C++ while retaining large datasets.
I am currently creating a program that outputs a database that I need for a later project, and I am struggling with memory control. I have the program written to a functional level that outputs the dataset that I am needing on a small scale, but to ramp up the size to where I need it and keep it realistic, I need to increase the number of iterations. Problem is when I do that I end up running out of memory on my computer (4gb) and it has to start pagefiling, which slows the processing considerably.
The basic outline is that I am creating stores, then creating a year's worth of transactional data for said store. When the store is created, a list of numbers is generated that represents the daily sales goals for the transactions, then transactions are randomly generated until that number is reached. This method gives some nicely organic results that I am quite happy with. Unfortunately all of those transactions have to be stored in memory until they are output to my file.
When the transactions are created they are temporarily stored in a vector, which I execute .clear() on after I store a copy of the vector in my permanent storage location.
I have started to try to move to unique_ptr's for my temporary storage, but I am unsure if they are even being deleted properly upon returning from the functions that are generating my data.
the code is something like this (I cut some superfluous code that wasn't pertinent to the question at hand)
void store::populateTransactions() {
vector<transaction> tempVec;
int iterate=0, month=0;
double dayTotal=0;
double dayCost=0;
int day=0;
for(int i=0; i<365; i++) {
if(i==dsf[month]) {
month++;
day=0;
}
while(dayTotal<dailySalesTargets[i]) {
tempVec.push_back(transaction(2013, month+1, day+1, 1.25, 1.1));
dayTotal+=tempVec[iterate].returnTotal();
dayCost+=tempVec[iterate].returnCost();
iterate++;
}
day++;
dailyTransactions.push_back(tempVec);
dailyCost.push_back(dayCost);
dailySales.push_back(dayTotal);
tempVec.clear();
dayTotal = 0;
dayCost = 0;
iterate = 0;
}
}
transaction::transaction(int year, int month, int day, double avg, double dev) {
rng random;
transTime = &testing;
testing = random.newTime(year, month, day);
itemCount = round(random.newNum('l', avg, dev,0));
if(itemCount <= 0) {
itemCount = 1;
}
for(int i=0; i<itemCount; i++) {
int select = random.newNum(0,libs::products.products.size());
items.push_back(libs::products.products[select]);
transTotal += items[i].returnPrice();
transCost += items[i].returnCost();
}
}
The reason you are running into memory issues is because as you add elements to the vector it eventually has to resize it's internal buffer. This entails allocating a new block of memory, copying the existing data to the new member and then deleting the old buffer.
Since you know the number of elements the vector will hold before hand you can call the vectors reserve() member function to allocate the memory ahead of time. This will eliminate the constant resizing that you are no doubt encountering now.
For instance in the constructor for transaction you would do the following before the loop that adds data to the vector.
items.reserve(itemCount);
In store::populateTransactions() you should calculate the total number of elements the vector will hold and call tempVec.reserve() in the same was described above. Also keep in mind that if you are using a local variable to populate the vector you will eventually need to copy it. This will cause the same issues as the destination vector will need to allocate memory before the contents can be copied (unless you use move semantics available in C++11). If the data needs to be returned to the calling function (as opposed to being a member variable of store) you should take it by reference as a parameter.
void store::populateTransactions(vector<transaction>& tempVec)
{
//....
}
If it is not practical to determine the number of elements ahead of time you should consider using std::deque instead. From cppreference.com
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.
The storage of a deque is automatically expanded and contracted as needed. Expansion of a deque is cheaper than the expansion of a std::vector because it does not involve copying of the existing elements to a new memory location.
In regard to the comment by Rafael Baptista about how the resize operation allocates memory the following example should give you a better idea of what it going on. The amount of memory listed is the amount required during the resize
#include <iostream>
#include <vector>
int main ()
{
std::vector<int> data;
for(int i = 0; i < 10000001; i++)
{
size_t oldCap = data.capacity();
data.push_back(1);
size_t newCap = data.capacity();
if(oldCap != newCap)
{
std::cout
<< "resized capacity from "
<< oldCap
<< " to "
<< newCap
<< " requiring " << (oldCap + newCap) * sizeof(int)
<< " total bytes of memory"
<< std::endl;
}
}
return 0;
}
When compiled with VC++10 the following results are generated when adding 1,000,001 elements to a vector. These results are specific to VC++10 and can vary between implementations of std::vector.
resized capacity from 0 to 1 requiring 4 total bytes of memory
resized capacity from 1 to 2 requiring 12 total bytes of memory
resized capacity from 2 to 3 requiring 20 total bytes of memory
resized capacity from 3 to 4 requiring 28 total bytes of memory
resized capacity from 4 to 6 requiring 40 total bytes of memory
resized capacity from 6 to 9 requiring 60 total bytes of memory
resized capacity from 9 to 13 requiring 88 total bytes of memory
resized capacity from 13 to 19 requiring 128 total bytes of memory
...snip...
resized capacity from 2362204 to 3543306 requiring 23622040 total bytes of memory
resized capacity from 3543306 to 5314959 requiring 35433060 total bytes of memory
resized capacity from 5314959 to 7972438 requiring 53149588 total bytes of memory
resized capacity from 7972438 to 11958657 requiring 79724380 total bytes of memory
This is fun! Some quick comments I can think of.
a. STL clear() does not always free the memory instantaneously. Instead you can use std::vector<transaction>().swap(tmpVec);.
b. If you are using a compiler which has C++11 vector::emplace_back then you should remove the push_back and use it. It should be a big boost both in memory and speed. With push_back you basically have two copies of the same data floating around and you are at the mercy of allocator to return it back to the OS.
c. Any reason you cannot flush dailyTransactions to disk every once in a while? You can always serialize the vector and write it out to disk, clear the memory and you should be good again.
d. As pointed by others, reserve should also help a lot.

maximum vector size reached prematurely

I am testing how large I can make a 1D vector on my computer. For this I am using the following MWE:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<double> vec;
const unsigned long long lim = 1E8;
for(unsigned long long i=0; i<lim; i++)
{
vec.push_back(i);
}
cout << vec.max_size() << endl; //outputs 536.870.911 on my 32-bit system
return 0;
}
As shown, max_size() gives me that a std::vector can contain 536.870.911 elements on my system. However, when I run the above MWE, I get the error
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
My computer has 2GB RAM, but 1E8 integers will only take up 381MB, so I don't see why I get a bad_alloc error?
1E8 = 100000000 and sizeof(double) = 8 [in nearly all systems], so 762MB. Now, if we start with a vector of, say, 16 elements, and it doubles each time it "outgrows" the current size, to get space for 1E8 elements, we get the following sequence:
16, 32, 64, 128, 256, ... 67108864 (64M entries), the next one is 134217728, taking up 8 * 128M = 1GB, and you ALSO have to have space for a 64M * 8 = 512MB chunk at the same time, to copy the old data from. Given that there isn't a full 2GB of space available in a 32-bit process, because some memory is used up for stack, program code, DLL's, and other such things, finding a 1GB contiguous region of space may be hard when there is (more than) 512MB already occupied.
The problem of "I can't fit as much as I thought in my memory" is not an unusual problem.
One solution would be to use std::vector::reserve() to pre-allocate enough space. That is much more likely to work, since you only need a single large allocation, not two - and it won't require much more than the 762MB either, since it's allocated to the right size, not some arbitrary "double what it currently is".