Cannot resize vector to 1765880295 - c++

I want to allocate a vector of size 1765880295 and so i used resize(1765880295) but the program stops running.The adjact problem would be code block not responding.
what is wrong?
Although the max_size gives 4294967295 which is greater than 1765880295 the problem is still the same even without resizing the vector.

Depending on what is stored in the vector -- for example, a 32-bit pointer or uint32, the size of the vector (number of elements * size of each element) will exceed the maximum addressable space on a 32-bit system.

The max_size is dependent on the implementation (some may have 1073741823 as their max_size). But even if your implementation supports a bigger number, the program will fail if there is not enough memory.
For example: if you have a vector<int>, and the sizeof(int) == 4 // bytes, we do the math, and...
1765880295 * 4 bytes = 7063521180 bytes ≈ 6.578 gygabytes
So you would require around 6.6GiB of free memory to allocate that enormous vector.

Related

Why does std::set container use much more memory than the size of its data?

For example, we have 10^7 32bit integers. The memory usage to store these integers in an array is 32*10^7/8=40MB. However, inserting 10^7 32bit integers into a set takes more than 300MB of memory. Code:
#include <iostream>
#include <set>
int main(int argc, const char * argv[]) {
std::set<int> aa;
for (int i = 0; i < 10000000; i++)
aa.insert(i);
return 0;
}
Other containers like map, unordered_set takes even more memory with similar tests. I know that set is implemented with red black tree, but the data structure itself does not explain the high memory usage.
I am wondering the reason behind this 5~8 times original data memory usage, and some workaround/alternatives for a more memory efficient set.
Let's examine std::set implementation in GCC (which is not much different in other compilers). std::set is implemented as a red-black tree on GCC. Each node has a pointer to parent, left, and right nodes and a color enumerator (_S_red and _S_black). This means that besides the int (which is probably 4 bytes) there are 3 pointers (8 * 3 = 24 bytes for a 64-bit system) and one enumerator (since it comes before the pointers in _Rb_tree_node_base, it is padded to 8 byte boundary, so effectively it takes extra 8 bytes).
So far I have counted 24 + 8 + 4 = 36 bytes for each integer in the set. But since the node has to be aligned to 8 bytes, it has to be padded so that it is divisible by 8. Which means each node takes 40 bytes (10 times bigger than int).
But this is not all. Each such node is allocated by std::allocator. This allocator uses new to allocate each node. Since delete can't know how much memory to free, each node also has some heap-related meta-data. The meta-data should at least contain the size of the allocated block, which usually takes 8 bytes (in theory, it is possible to use some kind of Huffman coding and store only 1 byte in most cases, but I never saw anybody do that).
Considering everything, the total for each int node is 48 bytes. This is 12 times more than int. Every int in the set takes 12 times more than it would have taken in an array or a vector.
Your numbers suggest that you are on a 32 bit system, since your data takes only 300 MB. For 32 bit system, pointers take 4 bytes. This makes it 3 * 4 + 4 = 16 byte for the red-black tree related data in nodes + 4 for int + 4 for meta-data. This totals with 24 bytes for each int instead of 4. This makes it 6 times larger than vector for a big set. The numbers suggest that heap meta-data takes 8 bytes and not just 4 bytes (maybe due to alignment constraint).
So on your system, instead of 40MB (had it been std::vector), it is expected to take 280MB.
If you want to save some peanuts, you can use a non-standard allocator for your sets. You can avoid the metadata overhead by using boost's Segregated storage node allocators. But that is not such a big win in terms of memory. It could boost your performance, though, since the allocators are simpler than the code in new and delete.

The operation of the sizeof operator in C++

On my MS VS 2015 compiler, the sizeof int is 4 (bytes). But the sizeof vector<int> is 16. As far as I know, a vector is like an empty box when it's not initialized yet, so why is it 16? And why 16 and not another number?
Furthermore, if we have vector<int> v(25); and then initialize it with int numbers, then still the size of v is 16 although it has 25 int numbers! The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?
The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?
You're confusing sizeof(std::vector) and std::vector::size(), the former will return the size of vector itself, not including the size of elements it holds. The latter will return the count of the elements, you can get all their size by std::vector::size() * sizeof(int).
so why is it 16? And why 16 and not another number?
What is sizeof(std::vector) depends on implmentation, mostly implemented with three pointers. For some cases (such as debug mode) the size might increase for the convenience.
std::vector is typically a structure which contains two elements: pointer (array) of its elements and size of the array (number of elements).
As size is sizeof(void *) and the pointer is also sizeof(void *), the size of the structure is 2*sizeof(void *) which is 16.
The number of elements has nothing to do with the size as the elements are allocated on the heap.
EDIT: As M.M mentioned, the implementation could be different, like the pointer, start, end, allocatedSize. So in 32-bit environment that should be 3*sizeof(size_t)+sizeof(void *) which might be the case here. Even the original could work with start hardcoded to 0 and allocatedSize computed by masking end so really dependent on implementation. But the point remains the same.
sizeof is evaluated at compile time, so it only counts the size of the variables declared in the class, which probably includes a couple of counters and a pointer. It's what the pointer points to that varies with the size, but the compiler doesn't know about that.
The size can be explained using pointers which can be: 1) begin of vector 2) end of vector and 3) vector's capacity. So it would be more of like implementation dependent and it will change for different implementation.
You seem to be mixing "array" with "vector". If you have a local array, sizeof will provide the size of the array indeed. However, vector is not an array. It is a class, a container from STL guaranteeing that the memory contents are located within a single block of memory (that may get relocated, if vector grows).
Now, if you take a look at the std::vector implementation, you'll notice it contains fields (at least in MSVC 14.0):
size_type _Count = 0;
typename _Alloc_types::_Alty _Alval; // allocator object (from base)
_Mylast
_Myfirst
That could sum up to 16 bytes under your implementation (note: experience may vary).

What is QList's maximum size?

Has anyone encountered a maximum size for QList?
I have a QList of pointers to my objects and have found that it silently throws an error when it reaches the 268,435,455th item, which is exactly 28 bits. I would have expected it to have at least a 31bit maximum size (minus one bit because size() returns a signed integer), or a 63bit maximum size on my 64bit computer, but this doesn't appear to be the case. I have confirmed this in a minimal example by executing QList<void*> mylist; mylist.append(0); in a counting loop.
To restate the question, what is the actual maximum size of QList? If it's not actually 2^32-1 then why? Is there a workaround?
I'm running a Windows 64bit build of Qt 4.8.5 for MSVC2010.
While the other answers make a useful attempt at explaining the problem, none of them actually answer the question or missed the point. Thanks to everyone for helping me track down the issue.
As Ali Mofrad mentioned, the error thrown is a std::bad_alloc error when the QList fails to allocate additional space in my QList::append(MyObject*) call. Here's where that happens in the Qt source code:
qlist.cpp: line 62:
static int grow(int size) //size = 268435456
{
//this is the problem line
volatile int x = qAllocMore(size * sizeof(void *), QListData::DataHeaderSize) / sizeof(void *);
return x; //x = -2147483648
}
qlist.cpp: line 231:
void **QListData::append(int n) //n = 1
{
Q_ASSERT(d->ref == 1);
int e = d->end;
if (e + n > d->alloc) {
int b = d->begin;
if (b - n >= 2 * d->alloc / 3) {
//...
} else {
realloc(grow(d->alloc + n)); //<-- grow() is called here
}
}
d->end = e + n;
return d->array + e;
}
In grow(), the new size requested (268,435,456) is multiplied by sizeof(void*) (8) to compute the size of the new block of memory to accommodate the growing QList. The problem is, 268435456*8 equals +2,147,483,648 if it's an unsigned int32, or -2,147,483,648 for a signed int32, which is what's getting returned from grow() on my OS. Therefore, when std::realloc() is called in QListData::realloc(int), we're trying to grow to a negative size.
The workaround here, as ddriver suggested, is to use QList::reserve() to pre-allocate the space, preventing my QList from ever having to grow.
In short, the maximum size for QList is 2^28-1 items unless you pre-allocate, in which case the maximum size truly is 2^31-1 as expected.
Update (Jan 2020): This appears to have changed in Qt 5.5, such that 2^28-1 is now the maximum size allowed for QList and QVector, regardless of whether or not you reserve in advance. A shame.
Has anyone encountered a maximum size for QList? I have a QList of pointers to my objects and have found that it silently throws an error when it reaches the 268,435,455th item, which is exactly 28 bits. I would have expected it to have at least a 31bit maximum size (minus one bit because size() returns a signed integer), or a 63bit maximum size on my 64bit computer, but this doesn't appear to be the case.
Theoretical maximum positive number stored in int is 2^31 - 1. Size of pointer is 4 bytes (for 32bit machine), so maximum possible number of them is 2^29 - 1. Appending data to the container will increases fragmentation of heap memory, so there is possible that you can allocate only half of possible memory. Try use reserve() or resize() instead.
Moreover, Win32 has some limits for memory allocation. So application compiled without special options cannot allocate more than this limit (1G or 2G).
Are you sure about this huge containers? Is it better to optimize application?
QList stores its elements in a void * array.
Hence, a list with 228 items, of which each one is a void *, will be 230 bytes long on a 32 bit machine, and 231 bytes on a 64 bit machine. I doubt you can request such a big chunk of contiguous memory.
And why allocating such a huge list anyhow? Are you sure you really need it?
The idea of be backed by an array of void * elements is because several operations on the list can be moved to non-templated code, therefore reducing the amount of generated code.
QList stores items straight in the void * array if the type is small enough (i.e. sizeof(T) <= sizeof(void*)), and if the type can be moved in memory via memmove. Otherwise, each item will be allocated on the heap via new, and the array will store the pointers to those items. A set of type traits is used to figure out how to handle each type, see Q_DECLARE_TYPEINFO.
While in theory this approach may sound attractive, in practice:
For all primitive types smaller than void * (char; int and float on 64 bit; etc.) you waste from 50 to 75% of the allocated space in the array
For all movable types bigger than void * (double on 32bit, QVariant, ...), you pay a heap allocation per each item in the list (plus the array itself)
QList code is generally less optimized than QVector one
Compilers these days do a pretty good job at merging template instantiations, hence the original reason for this design gets lost.
Today it's a much better idea to stick with QVector. Unfortunately the Qt APIs expose QList everywhere and can't change them (and we need C++11 to define QList as a template alias for QVector...)
I test this in Ubuntu 32bit with 4GB RAM using qt4.8.6. Maximum size for me is 268,435,450
I test this in Windows7 32bit with 4GB RAM using qt4.8.4. Maximum size for me is 134,217,722
This error happend : 'std::bad_alloc'
#include <QCoreApplication>
#include <QDebug>
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QList<bool> li;
for(int i=0; ;i++)
{
li.append(true);
if(i>268435449)
qDebug()<<i;
}
return a.exec();
}
Output is :
268435450
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

Size of std::array, std::vector and raw array

Lets we have,
std::array <int,5> STDarr;
std::vector <int> VEC(5);
int RAWarr[5];
I tried to get size of them as,
std::cout << sizeof(STDarr) + sizeof(int) * STDarr.max_size() << std::endl;
std::cout << sizeof(VEC) + sizeof(int) * VEC.capacity() << std::endl;
std::cout << sizeof(RAWarr) << std::endl;
The outputs are,
40
20
40
Are these calculations correct? Considering I don't have enough memory for std::vector and no way of escaping dynamic allocation, what should I use? If I would know that std::arrayresults in lower memory requirement I could change the program to make array static.
These numbers are wrong. Moreover, I don't think they represent what you think they represent, either. Let me explain.
First the part about them being wrong. You, unfortunately, don't show the value of sizeof(int) so we must derive it. On the system you are using the size of an int can be computed as
size_t sizeof_int = sizeof(RAWarr) / 5; // => sizeof(int) == 8
because this is essentially the definition of sizeof(T): it is the number of bytes between the start of two adjacent objects of type T in an array. This happens to be inconsistent with the number print for STDarr: the class template std::array<T, n> is specified to have an array of n objects of type T embedded into it. Moreover, std::array<T, n>::max_size() is a constant expression yielding n. That is, we have:
40 // is identical to
sizeof(STDarr) + sizeof(int) * STDarr.max_size() // is bigger or equal to
sizeof(RAWarr) + sizeof_int * 5 // is identical to
40 + 40 // is identical to
80
That is 40 >= 80 - a contradication.
Similarily, the second computation is also inconsistent with the third computation: the std::vector<int> holds at least 5 elements and the capacity() has to be bigger than than the size(). Moreover, the std::vector<int>'s size is non-zero. That is, the following always has to be true:
sizeof(RAWarr) < sizeof(VEC) + sizeof(int) * VEC.capacity()
Anyway, all this is pretty much irrelevant to what your actual question seems to be: What is the overhead of representing n objects of type T using a built-in array of T, an std::array<T, n>, and an std::vector<T>? The answer to this question is:
A built-in array T[n] uses sizeof(T) * n.
An std::array<T, n> uses the same size as a T[n].
A std::vector<T>(n) has needs some control data (the size, the capacity, and possibly and possibly an allocator) plus at least 'n * sizeof(T)' bytes to represent its actual data. It may choose to also have a capacity() which is bigger than n.
In addition to these numbers, actually using any of these data structures may require addition memory:
All objects are aligned at an appropriate address. For this there may be additional byte in front of the object.
When the object is allocated on the heap, the memory management system my include a couple of bytes in addition to the memory made avaiable. This may be just a word with the size but it may be whatever the allocation mechanism fancies. Also, this memory may live somewhere else than the allocate memory, e.g. in a hash table somewhere.
OK, I hope this provided some insight. However, here comes the important message: if std::vector<T> isn't capable of holding the amount of data you have there are two situations:
You have extremely low memory and most of this discussion is futile because you need entirely different approaches to cope with the few bytes you have. This would be the case if you are working on extremely resource constrained embedded systems.
You have too much data an using T[n] or std::array<T, n> won't be of much help because the overhead we are talking of is typically less than 32 bytes.
Maybe you can describe what you are actually trying to do and why std::vector<T> is not an option.

C++ vector max_size();

On 32 bit System.
std::vector<char>::max_size() returns 232-1, size of char — 1 byte
std::vector<int>::max_size() returns 230-1, size of int — 4 byte
std::vector<double>::max_size() returns 229-1, size of double — 8 byte
can anyone tell me max_size() depends on what?
and what will be the return value of max_size() if it runs on 64 bit system.
max_size() is the theoretical maximum number of items that could be put in your vector. On a 32-bit system, you could in theory allocate 4Gb == 2^32 which is 2^32 char values, 2^30 int values or 2^29 double values. It would appear that your implementation is using that value, but subtracting 1.
Of course, you could never really allocate a vector that big on a 32-bit system; you'll run out of memory long before then.
There is no requirement on what value max_size() returns other than that you cannot allocate a vector bigger than that. On a 64-bit system it might return 2^64-1 for char, or it might return a smaller value because the system only has a limited memory space. 64-bit PCs are often limited to a 48-bit address space anyway.
max_size() returns
the maximum potential size the vector
could reach due to system or library
implementation limitations.
so I suppose that the maximum value is implementation dependent. On my machine the following code
std::vector<int> v;
cout << v.max_size();
produces output:
4611686018427387903 // built as 64-bit target
1073741823 // built as 32-bit target
so the formula 2^(64-size(type))-1 looks correct for that case as well.
Simply get the answer by
std::vector<dataType> v;
std::cout << v.max_size();
Or we can get the answer by (2^nativePointerBitWidth)/sizeof(dataType) - 1. For example, on a 64 bit system, long long is (typically) 8 bytes wide, so we have (2^64)/8 - 1 == 2305843009213693951.