maximum vector size reached prematurely - c++

I am testing how large I can make a 1D vector on my computer. For this I am using the following MWE:
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector<double> vec;
const unsigned long long lim = 1E8;
for(unsigned long long i=0; i<lim; i++)
{
vec.push_back(i);
}
cout << vec.max_size() << endl; //outputs 536.870.911 on my 32-bit system
return 0;
}
As shown, max_size() gives me that a std::vector can contain 536.870.911 elements on my system. However, when I run the above MWE, I get the error
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
My computer has 2GB RAM, but 1E8 integers will only take up 381MB, so I don't see why I get a bad_alloc error?

1E8 = 100000000 and sizeof(double) = 8 [in nearly all systems], so 762MB. Now, if we start with a vector of, say, 16 elements, and it doubles each time it "outgrows" the current size, to get space for 1E8 elements, we get the following sequence:
16, 32, 64, 128, 256, ... 67108864 (64M entries), the next one is 134217728, taking up 8 * 128M = 1GB, and you ALSO have to have space for a 64M * 8 = 512MB chunk at the same time, to copy the old data from. Given that there isn't a full 2GB of space available in a 32-bit process, because some memory is used up for stack, program code, DLL's, and other such things, finding a 1GB contiguous region of space may be hard when there is (more than) 512MB already occupied.
The problem of "I can't fit as much as I thought in my memory" is not an unusual problem.
One solution would be to use std::vector::reserve() to pre-allocate enough space. That is much more likely to work, since you only need a single large allocation, not two - and it won't require much more than the 762MB either, since it's allocated to the right size, not some arbitrary "double what it currently is".

Related

Max number of elements vector

I was looking to see how many elements I can stick into a vector before the program crashes. When running the code below the program crashed with a bad alloc at i=90811045, aka when trying to add the 90811045th element. My question is: Why 90811045?
it is:
not a power of two
not the value that vector.max_size() gives
the same number both in debug and release
the same number after restarting my computer
the same number regardless of what the value of the long long is
note: I know I can fix this by using vector.reserve() or other methods, I am just interested in where 90811045 comes from.
code used:
#include <iostream>
#include <vector>
int main() {
std::vector<long long> myLongs;
std::cout << "Max size expected : " << myLongs.max_size() << std::endl;
for (int i = 0; i < 160000000; i++) {
myLongs.push_back(i);
if (i % 10000 == 0) {
std::cout << "Still going! : " << i << " \r";
}
}
return 0;
}
extra info:
I am currently using 64 bit windows with 16 GB of ram.
Why 90811045?
It's probably just incidental.
That vector is not the only thing that uses memory in your process. There is the execution stack where local variables are stored. There is memory allocated by for buffering the input and output streams. Furthermore, the global memory allocator uses some of the memory for bookkeeping.
90811044 were added succesfully. The vector implementation (typically) has a deterministic strategy for allocating larger internal buffer. Typically, it multiplies the previous capacity by a constant factor (greater than 1). Hence, we can conclude that 90811044 * sizeof(long long) + other_usage is consistently small enough to be allocated successfully, but (90811044 * sizeof(long long)) * some_factor + other_usage is consistently too much.

How does memory on the heap get exhausted?

I have been testing out some of my own code to see how much allocated memory it takes to exhaust the memory on the heap or free store. However, unless my code is wrong in the testing of it, I am getting completely different results in terms of how much memory can be put on the heap.
I am testing two different programs. The first program creates vector objects on the heap. The second program creates integer objects on the heap.
Here is my code:
#include <vector>
#include <stdio.h>
int main()
{
long long unsigned bytes = 0;
unsigned megabytes = 0;
for (long long unsigned i = 0; ; i++) {
std::vector<int>* pt1 = new std::vector<int>(100000,10);
bytes += sizeof(*pt1);
bytes += pt1->size() * sizeof(pt1->at(0));
megabytes = bytes / 1000000;
if (i >= 1000 && i % 1000 == 0) {
printf("There are %d megabytes on the heap\n", megabytes);
}
}
}
The final output of this code before getting a bad_alloc error is: "There are 2000 megabytes on the heap"
In the second program:
#include <stdio.h>
int main()
{
long long unsigned bytes = 0;
unsigned megabytes = 0;
for (long long unsigned i = 0; ; i++) {
int* pt1 = new int(10);
bytes += sizeof(*pt1);
megabytes = bytes / 1000000;
if (i >= 100000 && i % 100000 == 0) {
printf("There are %d megabytes on the heap\n", megabytes);
}
}
}
The final output of this code before getting a bad_alloc error is: "There are 511 megabytes on the heap"
The final output in both programs is vastly different. Am I misunderstanding something about the free store? I thought that both results would be about the same.
It is very likely that pointers returned by new on your platform are 16-byte aligned.
If int is 4 bytes, this means that for every new int(10) you're getting four bytes and making 12 bytes unusable.
This alone would explain the difference between getting 500MB of usable space from small allocations and 2000MB from large ones.
On top of that, there's overhead of keeping track of allocated blocks (at a minimum, of their size and whether they're free or in use). That is very much specific to your system's memory allocator but also incurs per-allocation overhead. See "What is a Chunk" in https://sourceware.org/glibc/wiki/MallocInternals for an explanation of glibc's allocator.
First of all you have to understand that operating system assign memory to process in quite large chunks of memory called pages (it is a hardware property). Page size is about 4 -16 kB.
Now standard library try use memory in efficient way. So it have to find a way to chop pages to smaller pieces and manage them. To do that some extra information about heap structure have to be maintained.
Here is cool Andrei Alexandrescu cppcon talk more or less how it works (it omits information about pages management).
So when you allocating lots of small objects information about heap structure is quite large. On other hand if you allocating smaller number of larger objects is more efficient - less memory is waisted on tracking memory structure.
Note also that depending on heap strategy sometimes (when small piece of memory is requested) it is more efficient to waste some memory and return larger size of memory then it was requested.

Memory usage of large 2D static array and vector of vectors

I need to use large matrix 20000 * 20000 for a machine learning project. When it is implemented as static array, it uses approximately 1.57 GB of memory. If it is implemented with vector of vectors it uses much more memory then the static array (approximately 3.06 GB). I could not figure out the reason behind it.
Array version:
static double distanceMatrix[20000][20000] = {0};
Vector of vectors:
vector<vector<double>> distanceMatrix(20000, vector<double> (20000));
I used them to store the distances between points.
for (int i = 0; i < 20000; i++){
for (int j = i+1; j < 20000; j++)
distanceMatrix[i][j] = euclid_dist(pointVec[i], pointVec[j]);
}
I also observed that when I used array version memory usage increases gradually during the nested loop. However, while using vector of vectors, memory usage reaches 3.06 GB then nested loop starts.
I checked the memory usage with Xcode debug navigator and Activity Monitor. Thanks in advance!
That's because of vector's memory allocation strategy, which is probably newsize=oldsize*const when reaching its limit (implementation-dependant), see also vector memory allocation strategy
First of all the array doesn't take 1,57 GB of memory. So there is an issue with the measurement.
Experiment with the static array
When running the following code in Xcode, you'll find out that the array is exactly 3,2 Gb in size:
const size_t matsize=20000;
static double mat2D[matsize][matsize] = {0};
cout<<"Double: " << sizeof (double) <<endl;
cout<<"Array: " << sizeof mat2D <<endl;
cout<<" " << sizeof(double)*matsize*matsize<<endl;
// ... then your loop
When the programme starts, its reported memory consumption is only 5,3MB before entering into the loop, although the static array is already there. Once the loop finished, the reported memory consumption is 1,57 Gb as you explained. But still not the 3,2Gb that we could expect.
The memory consumption figure that you read is the physical memory used by your process. The remaining memory of the process is in the virtual memory, which is much larger (7 Gb during my experiment).
Experiment of the vector
First, let's look at the approximate memory size of the vector, knowing that each vector has a fixed size plus a dynamically allocated variable size (based on the capacity which can be equal or greater than the number of elements actually stored in the vector). The following code can give you some estimates:
vector<vector<double>> mat2D(matsize, vector<double> (matsize));
cout<<"Vector (fixed part):" << sizeof (vector<double>)<<endl;
cout<<"Vector capacity: " << mat2D[0].capacity()<<endl;
cout<<" (variable): " << sizeof(double)*mat2D[0].capacity()<<endl;
cout<<" (total): " << sizeof(double)*mat2D[0].capacity() + sizeof(mat2D[0])<<endl;
cout<<"Vector of vector: " << (sizeof(double)*mat2D[0].capacity() + sizeof(mat2D[0]))*matsize+sizeof(mat2D)<<endl;
// then your loop
Running this code will show that the memory required to store your vector of vector is about 3,2 Gb + 480 Kb overhead (24 bytes per vector * 2001 vectors).
Before entering the loop, we will notice that already 3 Gb of physical memory is used. So MacOS uses twice the physical memory for this dynamic version compared to the array version. This is certainly because the allocation process requires to really access more memory upfront: each of the 2000 lines requires a separate allocation and initialization.
Conclusion
The overall virtual memory used in the two approaches is not that different. I could measure around 7Gb running it in debug mode with Xcode. The vector variant uses a little bit more than previously due to the extra 480Kb overhead for vectors.
The strategy used by macOS for using more or less physical memory is complex and may depend on many factors (including the access patterns), the goal being to find the best tradeoff between physical memory available and cost of swapping.
But the physical memory usage is not representative of the memory consumption by the process.

Read from 5x10^8 different array elements, 4 bytes each time

So I'm taking an assembly course and have been tasked with making a benchmark program for my computer - needless to say, I'm a bit stuck on this particular piece.
As the title says, we're supposed to create a function to read from 5x108 different array elements, 4 bytes each time. My only problem is, I don't even think it's possible for me to create an array of 500 million elements? So what exactly should I be doing? (For the record, I'm trying to code this in C++)
//Benchmark Program in C++
#include <iostream>
#include <time.h>
using namespace std;
int main() {
clock_t t1,t2;
int readTemp;
int* arr = new int[5*100000000];
t1=clock();
cout << "Memory Test"
<< endl;
for(long long int j=0; j <= 500000000; j+=1)
{
readTemp = arr[j];
}
t2=clock();
float diff ((float)t2-(float)t1);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Time Taken: " << seconds << " seconds" <<endl;
}
Your system tries to allocate 2 billion bytes (1907 MiB), while the maximum available memory for Windows is 2 gigabytes (2048 MiB). These numbers are very close. It's likely your system has allocated the remaining 141 MiB for other stuff. Even though your code is very small, OS is pretty liberal in allocation of the 2048 MiB address space, wasting large chunks for e.g. the following:
C++ runtime (standard library and other libraries)
Stack: OS allocates a lot of memory to support recursive functions; it doesn't matter that you don't have any
Paddings between virtual memory pages
Paddings used just to make specific sections of data appear at specific addresses (e.g. 0x00400000 for lowest code address, or something like that, is used in Windows)
Padding used to randomize the values of pointers
There's a Windows application that shows a memory map of a running process. You can use it by adding a delay (e.g. getchar()) before the allocation and looking at the largest contiguous free block of memory at that point, and which allocations prevent it from being large enough.
The size is possible :
5 * 10^8 * 4 = ~1.9 GB.
First you will need to allocate your array (dynamically only ! There's no such stack memory).
For your task the 4 bytes is the size of an interger, so you can do it
int* arr = new int[5*100000000];
Alternatively, if you want to be more precise, you can allocate it as bytes
int* arr = new char[5*4*100000000];
Next, you need to make the memory dirty (meaning write something into it) :
memset(arr,0,5*100000000*sizeof(int));
Now, you can benchmark cache misses (I'm guessing that's what it's intended in such a huge array) :
int randomIndex= GetRandomNumberBetween(0,5*100000000-1); // make your own random implementation
int bytes = arr[randomIndex]; // access 4 bytes through integer
If you want 5* 10 ^8 accesses randomly you can make a knuth shuffle inside your getRandomNumber instead of using pure random.

Wrangling memory for a highly iterative c++ program

tl:dr I am needing a way to better manage memory in C++ while retaining large datasets.
I am currently creating a program that outputs a database that I need for a later project, and I am struggling with memory control. I have the program written to a functional level that outputs the dataset that I am needing on a small scale, but to ramp up the size to where I need it and keep it realistic, I need to increase the number of iterations. Problem is when I do that I end up running out of memory on my computer (4gb) and it has to start pagefiling, which slows the processing considerably.
The basic outline is that I am creating stores, then creating a year's worth of transactional data for said store. When the store is created, a list of numbers is generated that represents the daily sales goals for the transactions, then transactions are randomly generated until that number is reached. This method gives some nicely organic results that I am quite happy with. Unfortunately all of those transactions have to be stored in memory until they are output to my file.
When the transactions are created they are temporarily stored in a vector, which I execute .clear() on after I store a copy of the vector in my permanent storage location.
I have started to try to move to unique_ptr's for my temporary storage, but I am unsure if they are even being deleted properly upon returning from the functions that are generating my data.
the code is something like this (I cut some superfluous code that wasn't pertinent to the question at hand)
void store::populateTransactions() {
vector<transaction> tempVec;
int iterate=0, month=0;
double dayTotal=0;
double dayCost=0;
int day=0;
for(int i=0; i<365; i++) {
if(i==dsf[month]) {
month++;
day=0;
}
while(dayTotal<dailySalesTargets[i]) {
tempVec.push_back(transaction(2013, month+1, day+1, 1.25, 1.1));
dayTotal+=tempVec[iterate].returnTotal();
dayCost+=tempVec[iterate].returnCost();
iterate++;
}
day++;
dailyTransactions.push_back(tempVec);
dailyCost.push_back(dayCost);
dailySales.push_back(dayTotal);
tempVec.clear();
dayTotal = 0;
dayCost = 0;
iterate = 0;
}
}
transaction::transaction(int year, int month, int day, double avg, double dev) {
rng random;
transTime = &testing;
testing = random.newTime(year, month, day);
itemCount = round(random.newNum('l', avg, dev,0));
if(itemCount <= 0) {
itemCount = 1;
}
for(int i=0; i<itemCount; i++) {
int select = random.newNum(0,libs::products.products.size());
items.push_back(libs::products.products[select]);
transTotal += items[i].returnPrice();
transCost += items[i].returnCost();
}
}
The reason you are running into memory issues is because as you add elements to the vector it eventually has to resize it's internal buffer. This entails allocating a new block of memory, copying the existing data to the new member and then deleting the old buffer.
Since you know the number of elements the vector will hold before hand you can call the vectors reserve() member function to allocate the memory ahead of time. This will eliminate the constant resizing that you are no doubt encountering now.
For instance in the constructor for transaction you would do the following before the loop that adds data to the vector.
items.reserve(itemCount);
In store::populateTransactions() you should calculate the total number of elements the vector will hold and call tempVec.reserve() in the same was described above. Also keep in mind that if you are using a local variable to populate the vector you will eventually need to copy it. This will cause the same issues as the destination vector will need to allocate memory before the contents can be copied (unless you use move semantics available in C++11). If the data needs to be returned to the calling function (as opposed to being a member variable of store) you should take it by reference as a parameter.
void store::populateTransactions(vector<transaction>& tempVec)
{
//....
}
If it is not practical to determine the number of elements ahead of time you should consider using std::deque instead. From cppreference.com
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.
The storage of a deque is automatically expanded and contracted as needed. Expansion of a deque is cheaper than the expansion of a std::vector because it does not involve copying of the existing elements to a new memory location.
In regard to the comment by Rafael Baptista about how the resize operation allocates memory the following example should give you a better idea of what it going on. The amount of memory listed is the amount required during the resize
#include <iostream>
#include <vector>
int main ()
{
std::vector<int> data;
for(int i = 0; i < 10000001; i++)
{
size_t oldCap = data.capacity();
data.push_back(1);
size_t newCap = data.capacity();
if(oldCap != newCap)
{
std::cout
<< "resized capacity from "
<< oldCap
<< " to "
<< newCap
<< " requiring " << (oldCap + newCap) * sizeof(int)
<< " total bytes of memory"
<< std::endl;
}
}
return 0;
}
When compiled with VC++10 the following results are generated when adding 1,000,001 elements to a vector. These results are specific to VC++10 and can vary between implementations of std::vector.
resized capacity from 0 to 1 requiring 4 total bytes of memory
resized capacity from 1 to 2 requiring 12 total bytes of memory
resized capacity from 2 to 3 requiring 20 total bytes of memory
resized capacity from 3 to 4 requiring 28 total bytes of memory
resized capacity from 4 to 6 requiring 40 total bytes of memory
resized capacity from 6 to 9 requiring 60 total bytes of memory
resized capacity from 9 to 13 requiring 88 total bytes of memory
resized capacity from 13 to 19 requiring 128 total bytes of memory
...snip...
resized capacity from 2362204 to 3543306 requiring 23622040 total bytes of memory
resized capacity from 3543306 to 5314959 requiring 35433060 total bytes of memory
resized capacity from 5314959 to 7972438 requiring 53149588 total bytes of memory
resized capacity from 7972438 to 11958657 requiring 79724380 total bytes of memory
This is fun! Some quick comments I can think of.
a. STL clear() does not always free the memory instantaneously. Instead you can use std::vector<transaction>().swap(tmpVec);.
b. If you are using a compiler which has C++11 vector::emplace_back then you should remove the push_back and use it. It should be a big boost both in memory and speed. With push_back you basically have two copies of the same data floating around and you are at the mercy of allocator to return it back to the OS.
c. Any reason you cannot flush dailyTransactions to disk every once in a while? You can always serialize the vector and write it out to disk, clear the memory and you should be good again.
d. As pointed by others, reserve should also help a lot.