Limits of C++ Vector - c++

i've got a issue with std::vector class, i created a struct
struct Triplet{
int first;
int second;
int third;
};
And i created a vector<Triplet> T.
My problem is that it won't contain as many elements as i need, even though T.max_size() = 357913941 i got only T.size() = 60540697 or T.size() = 40360465 using that function
vector<Triplet> T;
while(true)
{
Triplet t;
t.first = 1; t.second = 1 ; t.third = 1;
try {
T.push_back(t);
} catch (...) {
break;
}
}
qDebug() << T.size();
can anyone explain why is it doing that please ?
i'm running on Windows 10 and 16Go of RAM, with Qt and VSC++ 2017 x86 (due to Lemon library which i couldn't compile for x64),

An std::vector needs a contiguous (=no holes) chunk of memory to exist in.
Furthermore, when pushing elements to a vector you can overshoot the internal capacity, meaning that it has to allocate memory for another std::vector (usually double the size) and copy the elements over.
Keep in mind that in 32-bit Windows programs, you only have 2 GB of available memory space to play with in a single process, regardless of how much memory your system has. Your vector with size 60540697 * 12 is taking up 700+ MB of that. There is simply no place to allocate the next size (1.4GB) because the memory space is too small.
The simplest solution is to compile in 64-bit mode, which has plenty of virtual memory. As a stop-gap solution you could try pre-allocating space in the std::vector with T.reserve(80000000) or so. This will avoid an intermediate copy but probably will not be enough. It might even fail if your memory space is fragmented in a bad way!

Related

C++ pointers vs std::vector: any implication for long size variables?

I have a C background and I am recoding some old code in C++...
In the process, I am starting to use C++ Vectors, which are so easy to use!
Would vectors deal well with very long streams of data? For example, in audio apps, loading a stereo 3 minutes song would need almost 16M floats
float *stereoSong = NULL;
stereoSong = new floats[15787800];
Not having to deal with memory management with vectors is very nice, but I was wondering if that huge amount of data would be handled well by C++ vectors
Thanks!
This is a false comparison.
For a start, vectors use pointers. They have to. Vectors are containers that use dynamic allocation to provide you with a buffer of data items. You could try to implement the same thing "with pointers", but you'd end up with something someway between a vector, and a worse version of vector.
So, vectors can handle as much data as you'd be able to handle with new double[] — that is, a lot.
The answer very much depends on your platform.
You're talking about just over 150 MiB of data (a double is 8 bytes on practically all modern platforms).
This code has no trouble on a 'toy' environment (https://ideone.com/A6GmvQ):
#include <iostream>
#include <vector>
void build(std::vector<double>& data, const size_t size){
void* ptr{data.data()};
for(size_t i{1};i<=size;++i){
data.push_back(2*i);
if(data.data()!=ptr){
ptr=data.data();
std::cout << i << ' ' << ptr << ' ' << data.capacity() << std::endl;
}
}
}
int main() {
size_t size{100000000L};
std::cout << ((size*sizeof(double))/1024/1024) << " MiB" << std::endl;
std::vector<double> data{};
build(data,size);
double sum{0};
for(auto curr : data){
sum+=curr;
}
std::cout << sum << std::endl;
return 0;
}
This code is knowingly dumb and doesn't even try to reserve capacity for the values (which can help) because std::vector<> helps with that anyway.
Behind the scenes the vector allocates a block of capacity and then re-allocates another larger capacity when the logical size of the vector exceeds the capacity.
The code 'watches' the internal representation and outputs each re-allocation...
There are members for you to help with that capacity management if you're consuming the values as a stream (which sounds likely for audio).
The short answer is 'give it a go'. I've cranked that up to 100M doubles and have no issue.
std::vectorand co. were one of the reasons I changed form C to C++.
It takes all the management boilerplate out of array management.
When I need to resize an array allocation I would have to do the following
allocate new memory
copy the elements over
delete the old memory
Also all lifetime management is handled by the std::vector no more messing around with delete at the end of the lifetime, makes handling multiple exit points in a function much easier.
The limit due to the implementation is std::vector::max_size. For example here std::vector<float>::max_size() is 2305843009213693951.
However, thats just the theoretical limit due to constraints of the implementation. Much sooner than that you will hit the memory limit of your hardware.
A std::vector<float> does not use (substantially) more memory than a dynamic c-array.

Understanding C++ map why heap memory isn't released with clear()?

Suppose I have a forever loop to create hashmap:
void createMap() {
map<int, int> mymap;
for (int i = 0; i < INT_MAX; i++) {
mymap[i] = i;
}
mymap.clear(); // <-- this line doesn't seem to make a difference in memory growth
}
int main (void) {
while (1) {
createMap();
}
return 0;
}
I watched the code run and on MacOS, watching the Activity Monitor, the application keeps growing the memory usage with or without the mymap.clear() at end of the createMap() function.
Shouldn't memory usage be constant for the case where mymap.clear() is used?
What's the general recommendation for using STL data containers? Need to .clear() before end of function?
I asked in another forum, the folks there helped me understand the answer. It turns out, I didn't wait long enough to exit createMap function nor do I have enough memory to sustain this program.
It takes INT_MAX=2147483647 elements to be created, and for each map = 24 bytes element of pair<int, int> = 8 bytes.
Total minimum memory = 2.147483647^9 * 8 + 24 = 17179869200 bytes ~= 17.2 GB.
I reduced the size of the elements and tested both with and without .clear() the program grew and reduce in size accordingly.
The container you create is bound to the scope of your function. If the function returns, its lifetime ends. And as std::map owns its data, the memory it allocates is freed upon destruction.
Your code hence constantly allocates and frees the same amount of memory. Memory consumption is hence constant, although the exact memory locations will probably differ. This also means that you should not manually call clear at the end of this function. Use clear when you want to empty a container that you intend to continue using afterwards.
As a side note, std::map is not a hash map (std::unordered_map is one).

Unhandled exception at .. Microsoft C++ exception: std::bad_alloc at memory location

I have a large vector (mainvect) of struct info objects (about 8million element), and I want to remove duplicates, the struct consist of pid, and uid .
struct info
{
int pid;
string uid;
}
I have another vector (vect1) which contain information of each pid and its occurrence in mainvect (its help in search specific indices not all the main vect) size of vect1 is 420k elements
struct pidInfo
{
int pid;
int numofoccurence;
}
I want to store unqiue elements in mainvect in vect2.
.
.
// sort mainvect based on pid
sort(mainvect.begin(), mainvect.end(), sortByPId());
int start = 0;
int end = 0;
vector <string> temp; // to store uids with a specific pid
for (int i = 0; i < vect1.size(); i++)
{
end = end + vect1[i].numofoccurence;
for (int j = start; j < end; j++)
{
temp.push_back(mainvect[j].uid);
}
start = start + vect1[i].numofoccurence;
// remove duplicate uid
sort(temp.begin(), temp.end());
temp.erase(unique(temp.begin(), temp.end()), temp.end());
// push remaining unique uids
for (int k = 0; k < temp.size(); k++)
{
info obb;
obb.pid = vect1[i].pid;
obb.uid = temp[k];
vect2.push_back(obb);
}
// empty the temp vector to use in next i iteration
temp.erase(temp.begin(), temp.end());
}
.
.
But when I run the code, it gave me exception as shown in the following figure
I think you actually have algorithm problem. On each iteration you are sorting and leaving only unique elements intemp vector. But with this approach each iteration will add more and more duplicates into vect2. So you should sort and leave only unique elements in vect2 as well. Actually it would be probably better to utilize std::set instead of temp and vect2.
Another suggestion would be to utilize a better storage for uid if it has some sort of fixes-length format, such as GUID.
You are running out of memory. There are few things you can do:
You are building 32-bit program on Windows. This means you have only 2 GB of RAM available, regardles of how much physical memory have on your machine. You should build your program for 64-bit architecture to get access to all RAM you have. To do this you need to create of new configuration for your project with platform set to x64.
You should be smarter about using your memory. The quickest thing you can do is to replace std::vector with std::dequeu for large vectors.
The problem with std::vector is that every time it grows it allocates a new memory chunk and copies all data. MSVC implementation you are using grows vector by factor of 1.5 each time. So if vector takes 1 GB of memory, next time it resizes it will try to allocate 1.5 GB, taking 2.5 GB of RAM in total while resizing is in progress.
Implementations of std::deque will usually allocate memory in smaller chunks, so it will have less problem with resizing.
Another thing you should pay attention to is std::string. MSVC implementation uses SSO (Small-String-Optimization). Every instance ofstd::string` afair takes 32 bytes on x86. So every element in your 8 million elements vector might or might not be wasting this memory.
Depending on how much time you want to spend on your program, you might want to learn about memory-mapped files.
As state above, you are running out of memory. If you really have so many elements, it may be a sensible idea to look into a small database, like sqlite.
But since the question is about C++ standard containers, you are approaching the problem a bit hamfisted. You are doing many unnecessary sorts and loops. Not only riddled with bugs, your algorithm is at least O(n^3)
Why not use one of the already sorted containers, for example std::map? You can deduplciate a list like so:
std::vector<info> input;
// copy into map
std::map<int, info> tmp;
for (info& i : mainvect) {
tmp[i.pid] = i;
}
// copy back out
std::vector<info> output(tmp.size());
std::transform(tmp.begin(), tmp.end(), output.begin(), [] (const std::pair<int, info>& p) {
return p.second;
});
Not only is the code cleaner, it runs at O(n + ln(n)). Or, skip the second step and use an std::map or std::set for the data in the first place.
Also if you handle a huge number of items, you don't want to use a std::vector. The key problem is that the memory for the vector needs to be one continuous piece of memory. You may want to use a deque or a list.

Fast way to push_back a vector many times

I have identified a bottleneck in my c++ code, and my goal is to speed it up. I am moving items from one vector to another vector if a condition is true.
In python, the pythonic way of doing this would be to use a list comprehension:
my_vector = [x for x in data_vector if x > 1]
I have hacked a way to do this in C++, and it is working fine. However, I am calling this millions of times in a while-loop and it is slow. I do not understand much about memory allocation, but I assume that my problem has to do with allocating memory over-and-over again using push_back. Is there a way to allocate my memory differently to speed up this code? (I do not know how large my_vector should be until the for-loop has completed).
std::vector<float> data_vector;
// Put a bunch of floats into data_vector
std::vector<float> my_vector;
while (some_condition_is_true) {
my_vector.clear();
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
my_vector.push_back(data_vector[i]);
}
}
// Use my_vector to render graphics on the GPU, but do not change the elements of my_vector
// Change the elements of data_vector, but not the size of data_vector
}
Use std::copy_if, and reserve data_vector.size() for my_vector initially (as this is the maximum possible number of elements for which your predicate could evaluate to true):
std::vector<int> my_vec;
my_vec.reserve(data_vec.size());
std::copy_if(data_vec.begin(), data_vec.end(), std::back_inserter(my_vec),
[](const auto& el) { return el > 1; });
Note that you could avoid the reserve call here if you expect that the number of times that your predicate evaluates to true will be much less than the size of the data_vector.
Though there are various great solutions posted by others for your query, it seems there is still no much explanation for the memory allocation, which you do not much understand, so I would like to share my knowledge about this topic with you. Hope this helps.
Firstly, in C++, there are several types of memory: stack, heap, data segment.
Stack is for local variables. There are some important features associated with it, for example, they will be automatically deallocated, operation on it is very fast, its size is OS-dependent and small such that storing some KB of data in the stack may cause an overflow of memory, et cetera.
Heap's memory can be accessed globally. As for its important features, we have, its size can be dynamically extended if needed and its size is larger(much larger than stack), operation on it is slower than stack, manual deallocation of memory is needed (in nowadays's OS, the memory will be automatically freed in the end of program), et cetera.
Data segment is for global and static variables. In fact, this piece of memory can be divided into even smaller parts, e.g. BBS.
In your case, vector is used. In fact, the elements of vector are stored into its internal dynamic array, that is an internal array with a dynamic array size. In the early C++, a dynamic array can be created on the stack memory, however, it is no longer that case. To create a dynamic array, ones have to create it on heap. Therefore, the elements of vector are stored in an internal dynamic array on heap. In fact, to dynamically increase the size of an array, a process namely memory reallocation is needed. However, if a vector user keeps enlarging his or her vector, then the overhead cost of reallocation cost will be high. To deal with it, a vector would firstly allocate a piece of memory that is larger than the current need, that is allocating memory for potential future use. Therefore, in your code, it is not that case that memory reallocation is performed every time push_back() is called. However, if the vector to be copied is quite large, the memory reserved for future use will be not enough. Then, memory allocation will occur. To tackle it, vector.reserve() may be used.
I am a newbie. Hopefully, I have not made any mistake in my sharing.
Hope this helps.
Run the code twice, first time only counting, how many new elements you will need. Then use reserve to already allocate all the memory you need.
while (some_condition_is_true) {
my_vector.clear();
int newLength = 0;
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
newLength++;
my_vector.reserve(newLength);
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
my_vector.push_back(data_vector[i]);
}
}
// Do stuff with my_vector and change data_vector
}
I doubt allocating my_vector is the problem, especially if the while loop is executed many times as the capacity of my_vector should quickly become sufficient.
But to be sure you can just reserve capacity in my_vector corresponding to the size of data_vector:
my_vector.reserve(data_vector.size());
while (some_condition_is_true) {
my_vector.clear();
for (auto value : data_vector) {
if (value > 1)
my_vector.push_back(value);
}
}
If you are on Linux you can reserve memory for my_vector to prevent std::vector reallocations which is bottleneck in your case. Note that reserve will not waste memory due to overcommit, so any rough upper estimate for reserve value will fit your needs. In your case the size of data_vector will be enough. This line of code before while loop should fix the bottleneck:
my_vector.reserve(data_vector.size());

Maximum number of stl::list objects

The problem is to find periodic graph patterns in a dataset. So I have 1000 timesteps with a graph(encoded as integers) in each timestep. So, there are 999 possible periods in which the graph can occur. Also I define a phase offset defined as (timestep mod period). For a graph which was first seen in the 5th timestep with period 2, the phase offset is 1.
I am trying to create a bidimensional array of lists in C++. Each cell contains a list containing graphs having a specified period and phase offset. I keep inserting graphs in the corresponding lists.
list<ListNode> A[timesteps][phase offsets]
ListNode is a class with 4 integer variables.
This gives me Segmentation fault. Using 500 for the size runs fine. Is this due to lack of memory or some other issue?
Thanks.
Probably due to limited stack size.
You're creating an array of 1000x1000 = 1000000 objects that are almost certainly at least 4 bytes apiece, so roughly 4 megabytes at a minimum. Assuming that's inside a function, it'll be auto storage class, which normally translates to being allocated on the stack. Typical stack sizes are around 1 to 4 megabytes.
Try something like: std::vector<ListNode> A(1000*1000); (and, if necessary, create a wrapper to make it look 2-dimensional).
Edit: The wrapper would overload an operator to give you 2D addressing:
template <class T>
class array_2D {
std::vector<T> data;
size_t cols;
public:
array_2D(size_t x, size_t y) : cols(x), data(x*y) {}
T &operator()(size_t x, size_t y) { return data[y*cols+x]; }
};
You may want to embellish that (e.g., with bounds checking) but that's the general idea. Addressing it would use (), as in:
array_2d<int> x(1000, 1000);
x(100, 3) = 2;
y = x(20, 20);
Sounds like you're running out of stack space. Try allocating it on the heap, e.g. through std::vector, and wrap in try ... catch to see out of memory errors instead of crashing.
(Edit: Don't use std::array since it also allocates on the stack.)
try {
std::vector<std::list<ListNode> > a(1000000); // create 1000*1000 lists
// index a by e.g. [index1 * 1000 + index2]
a[42 * 1000 + 18].size(); // size of that list
// or if you really want double subscripting without a wrapper function:
std::vector<std::vector<std::list<ListNode> > > a(1000);
for (size_t i = 0; i < 1000; ++i) { // do 1000 times:
a[i].resize(1000); // default-create and create 1000 in each
}
a[42][18].size(); // size of that list
} catch (std::exception const& e) {
std::cerr << "Caught " << typeid(e).name() << ": " << e.what() << std::endl;
}
In libstdc++ on a 32 bit system a std::list object weights 8 bytes (only the object itself, not counting the allocations it may make), and even in other implementations I don't think it will be much different; so, you are allocating about 8 MB of data, which isn't much per se on a regular computer, but, if you are putting that declaration in a function it will be a local variable, thus allocated on the stack, which is quite limited in size (few MBs at most).
You should allocate that thing on the heap, e.g. using new, or, even better using a std::vector.
By the way, it doesn't seem right that you need a 1000x1000 array of std::list, could you specify exactly what you are trying to achieve? Probably there are data structures that better fit your needs.
You're declaring a two-dimensional array [1000]x[1000] of list<ListNode>. I don't think that's what you intended.
The segmentation fault is probably from trying to use elements of the list that aren't valid.