stl "vector<T> too long" - c++

i read in other answers that theres no limit imposed by c++ compiler maximum size of std::vector. i am trying to use vector for one purpose, and in need to have 10^19 items.
typedef struct{
unsigned long price, weight;
}product;
//inside main
unsigned long long n = 930033404565174954;
vector<product> psorted(n);
the program breaks on the last statement. if i try resize(n) instead of initializing with n then also program breaks with message :
vector<T> too long
std::length_error at memory location
i need to sort the data accourding to price after putting in vector. what should i do ?

std::vector does have limits on how much stuff it can carry. You can query this with std::vector::max_size, which returns the maximum size you can use.
10^19 items.
Do you have 10^19 * sizeof(product) memory? I'm guessing that you don't have ~138 Exabytes of RAM. Plus, you'd have to be compiling in 64-bit mode to even consider allocating that much. The compiler isn't breaking; your execution is breaking for trying to allocate too much stuff.

Others have already told you what the problem is. One possible solution is to use the STXXL library, which is an implementation of STL that's designed for huge, out-of-memory datasets.
However, 10^19 8-byte items is 80 million TB. I'm not sure anyone has a disk that large...
Also, assuming a generous disk bandwidth of 300MB/s, this would take 8000 years to write!

Related

Create array upto 10^12

I tried to create an array with size upto 10^12 elements in c++. But I can only make array upto 1000001 size. i.e
long long int dp[1000001]
But I want to store data upto 10^12 values in the array. Any Idea how can I implement this in C++ ?
First, you must realize that the size of that array is nearly 8 TB. Does your computer have that much memory? Probably not. In such case, you cannot store that much data in memory, and practically cannot have such a large array.
Any Idea how can I implement this
Instead of an array in memory, you could store the data in the file system... Assuming you have 8 TB free storage. You can use a paging mechanism to read and write small pieces of the file at a time.
The simplest way to implement that in C++ is to use operating system functionality to map the file into the memory. That way the operating system takes care of the paging. There is no standard way to map files into memory in C++, so first step is to figure out what operating system you're using. POSIX standard specifies mmap function for this purpose.
Before doing that however, I recommend considering whether you actually need to store that much data. Perhaps you need a smarter algorithm instead.

C++ creating huge vector

For a process I'm trying to run I need to have a std::vector of std::tuple<long unsigned int, long unsigned int>. The test I'm doing right now should create a vector of 47,614,527,250 (around 47 billion) tuples but actually crashes right there on creation with the error terminate called after throwing an instance of 'std::bad_alloc'. My goal is to use this script with a vector roughly twice that size. The code is this:
arc_vector = std::vector<std::tuple<long unsigned int, long unsigned int>>(arcs);
where arcs is a long unsigned int with the cited value.
Can I, and in that case how do I, increase the memory size? This script is running on a 40-core machine with something like 200GB of memory so I know memory itself is not an issue.
47 billion tuples times 16 bytes each tuple is 780 billion bytes, which is about 760 gb. Your machine has less than 1/3 of the memory required for that, so you really need another approach, regardless of the reason your program crashes.
A proposal I can give you is to use a memory mapped file of 1TB to store that array, and if you really need to use a vector as interface you might write a custom allocator for it that uses the mapped memory. That should sort out your lack of main memory in a quasi-transparent way. If your interface requires a standard vector, with standard allocators, you are better re-designing that.
Another point to add, check what value you have for ulimit for the user running the process, because it might have a more strict limit of virtual memory than 760 gb.
You may well have a machine with a lot of memory but the problem is that you require that memory to be contiguous.
Even with memory virtualisation, that's unlikely.
For that amount of data, you'll need to use a different storage container. You could roll your own based on a linked list of vectors that subdivide the data, a vector of pointers to subdivided vectors of your tuples, or find a library that has such a construction already built.

c++ boost::multi_array index too large

I'm using a two-dimensional boost::multi_array to store objects of a custom struct. The problem is that I have a huge amount of these objects so that the index of the array I would need exceeds the range of an integer. Is there any possibility to use long as an index of a multi-array or do you have any other suggestions on how to store a dataset this big and still keep it accessible at a decent speed?
Thanks!
The official documentation states that the index type is unspecified, but looking into the repository, one sees that the definition most likely is typedef std::ptrdiff_t index;
So if you compile for an x86 32-bit system, you will surely run out of addressable memory anyways, so the limited size of indicies is not your real problem. Your only option would be to chose a system with enough memory, which has to be one with more than 2^32 bytes and thus has to be a 64 bit one. 2^64 will be certainly enough to represent the dimensions of your multiarray.

c++ Alternative implementation to avoid shifting between RAM and SWAP memory

I have a program, that uses dynamic programming to calculate some information. The problem is, that theoretically the used memory grows exponentially. Some filters that I use limit this space, but for a big input they also can't avoid that my program runs out of RAM - Memory.
The program is running on 4 threads. When I run it with a really big input I noticed, that at some point the program starts to use the swap memory, because my RAM is not big enough. The consequence of this is, that my CPU-usage decreases from about 380% to 15% or lower.
There is only one variable that uses the memory which is the following datastructure:
Edit (added type) with CLN library:
class My_Map {
typedef std::pair<double,short> key;
typedef cln::cl_I value;
public:
tbb::concurrent_hash_map<key,value>* map;
My_Map() { map = new tbb::concurrent_hash_map<myType>(); }
~My_Map() { delete map; }
//some functions for operations on the map
};
In my main program I am using this datastructure as globale variable:
My_Map* container = new My_Map();
Question:
Is there a way to avoid the shifting of memory between SWAP and RAM? I thought pushing all the memory to the Heap would help, but it seems not to. So I don't know if it is possible to maybe fully use the swap memory or something else. Just this shifting of memory cost much time. The CPU usage decreases dramatically.
If you have 1 Gig of RAM and you have a program that uses up 2 Gb RAM, then you're going to have to find somewhere else to store the excess data.. obviously. The default OS way is to swap but the alternative is to manage your own 'swapping' by using a memory-mapped file.
You open a file and allocate a virtual memory block in it, then you bring pages of the file into RAM to work on. The OS manages this for you for the most part, but you should think about your memory usage so not to try to keep access to the same blocks while they're in memory if you can.
On Windows you use CreateFileMapping(), on Linux you use mmap(), on Mac you use mmap().
The OS is working properly - it doesn't distinguish between stack and heap when swapping - it pages you whatever you seem not to be using and loads whatever you ask for.
There are a few things you could try:
consider whether myType can be made smaller - e.g. using int8_t or even width-appropriate bitfields instead of int, using pointers to pooled strings instead of worst-case-length character arrays, use offsets into arrays where they're smaller than pointers etc.. If you show us the type maybe we can suggest things.
think about your paging - if you have many objects on one memory page (likely 4k) they will need to stay in memory if any one of them is being used, so try to get objects that will be used around the same time onto the same memory page - this may involve hashing to small arrays of related myType objects, or even moving all your data into a packed array if possible (binary searching can be pretty quick anyway). Naively used hash tables tend to flay memory because similar objects are put in completely unrelated buckets.
serialisation/deserialisation with compression is a possibility: instead of letting the OS swap out full myType memory, you may be able to proactively serialise them into a more compact form then deserialise them only when needed
consider whether you need to process all the data simultaneously... if you can batch up the work in such a way that you get all "group A" out of the way using less memory then you can move on to "group B"
UPDATE now you've posted your actual data types...
Sadly, using short might not help much because sizeof key needs to be 16 anyway for alignment of the double; if you don't need the precision, you could consider float? Another option would be to create an array of separate maps...
tbb::concurrent_hash_map<double,value> map[65536];
You can then index to map[my_short][my_double]. It could be better or worse, but is easy to try so you might as well benchmark....
For cl_I a 2-minute dig suggests the data's stored in a union - presumably word is used for small values and one of the pointers when necessary... that looks like a pretty good design - hard to improve on.
If numbers tend to repeat a lot (a big if) you could experiment with e.g. keeping a registry of big cl_Is with a bi-directional mapping to packed integer ids which you'd store in My_Map::map - fussy though. To explain, say you get 987123498723489 - you push_back it on a vector<cl_I>, then in a hash_map<cl_I, int> set [987123498723489 to that index (i.e. vector.size() - 1). Keep going as new numbers are encountered. You can always map from an int id back to a cl_I using direct indexing in the vector, and the other way is an O(1) amortised hash table lookup.

Declaring large character array in c++

I am trying right now to declare a large character array. I am using the character array as a bitmap (as in a map of booleans, not the image file type). The following code generates a compilation error.
//This is code before main. I want these as globals.
unsigned const long bitmap_size = (ULONG_MAX/(sizeof(char)));
char bitmap[bitmap_size];
The error is overflow in array dimension. I recognize that I'm trying to have my process consume a lot of data and that there might be some limit in place that prevents me from doing so. I am curious as to whether I am making a syntax error or if I need to request more resources from the kernel. Also, I have no interest in creating a bitmap with some class. Thank you for your time.
EDIT
ULONG_MAX is very much dependent upon the machine that you are using. On the particular machine I was compiling my code on it was well over 4.2 billion. All in all, I wouldn't to use that constant like a constant, at least for the purpose of memory allocation.
ULONG_MAX/sizeof(char) is the same as ULONG_MAX, which is a very large number. So large, in fact, that you don't have room for it even in virtual memory (because ULONG_MAX is probably the number of bytes in your entire virtual memory).
You definitely need to rethink what you are trying to do.
It's impossible to declare an array that large on most systems -- on a 32-bit system, that array is 4 GB, which doesn't fit into the available address space, and on most 64-bit systems, it's 16 exabytes (16 million terabytes), which doesn't fit into the available address space there either (and, incidentally, may be more memory than exists on the entire planet).
Use malloc() to allocate large amounts of memory. But be realistic. :)
As I understand it, the maximum size of an array in c++ is the largest integer the platform supports. It is likely that your long-type bitmap_size constant exceeds that limit.